JAN 2 2 1946 


A JOURNAL DEVOTED TO THE DEVEL- 
OPMENT OF PSYCHOLOGY AS A 
QUANTITATIVE RATIONAL SCIENCE 


THE PSYCHOMETRIC SOCIETY - ORGANIZED IN 1935 


Psychometrika 
VOLUME 10 | | | 
NUMBER 4 
DECEMBER 


PSYCHOMETRIKA, the official journal of the Psychometric Society, is devoted to the 
development of psychology as a quantitative rational science. Issued four times 
a year, on March 15, June 15, September 15, and December 15. 


DECEMBER 1945, VOLUME 10, NUMBER 4 


Printed for the Psychometric Society at 28 West Colorado Avenue, Colorado 
Springs, Colorado. Entered as second class matter, September 17, 1940, at the 
Post Office of Colorado Springs, Colorado, under the act of March 8, 1879. Edi- 
torial Office, College Entrance Examination Board, Princeton, New Jersey. 


Subscription Price; 

To non-members, the subscription price is $5.00 per volume of four issues. 
Members of the Psychometric Society pay annual dues of $5.00, of which 
$4.50 is in payment of a subscription to Psychometrika. Student members of 
the Psychometric Society pay auanal dues.of $3.00, of which $2.70 is in pay- 
ment for the journal. 

The subscription price to libraries and other institutions is $10.00 per year; 
this price includes one extra copy of each issue. 


Applications for membership and student membership in the Psychometric . 
Society should be sent to ws 
THELMA GWINN THURSTONE 
Chairman of the Membership Committee 
University of Chicago 
Chicago, Illinois 


Membership dues are payable to 
IRvING D. LORGE 
Treasurer of the Psychometric Society 
425 West 128rd Street 
New York, New York 


Individual and library subscriptions are payable to, and 
change of address should be sent to 
IrvinG D. LORGE 
Treasurer of the Psychometric Corporation - 
425 West 123rd Street 
New York, New York 


Articles on the following subjects are published in Psychometrika: 


(1) the development of quantitative aes for the solution of psychologi- 
cal problems; 
(2) general theoretical articles on soinluaiic methodology in the social and 
biological sciences; 
(3) new mathematical and statistical techniques for the evaluation of psy- . 
chological data; 
_ (4) aids in. the application of statistical techniques, such as monographs, 
tables, work-sheet layouts, forms, and apparatus; 
(5) critiques or reviews of significant studies involving the use of quantita- 
tive techniques. 
(Continued on the back inside cover page) 


x 
= 
“4 
f 
| 
| 
7 


Psychometri 


a 


CONTENTS 


SPEARMAN ASIKNEW HIM - - - - - - = - 
KARL J. HOLZINGER 


THE PREDICTION OF CHOICE - - - - - - =- - 
L. L. THURSTONE 


A BASIS FOR ANALYZING TEST-RETEST RELIABILITY 
LOUIS GUTTMAN 


A SIMPLE ORTHOGONAL MULTIPLE FACTOR 
APPROXIMATION PROCEDURE - - .- - - 


HILDING B. CARLSON 


APPENDIX A—REPORT OF THE TREASURER OF 
THE PSYCHOMETRIC SOCIETY - - - - 


APPENDIX B—REPORT OF THE TREASURER OF 
THE PSYCHOMETRIC CORPORATION - - 


APPENDIX C—REPORT TO THE PSYCHOMETRIC 
SOCIETY BY ITS COMMITTEE ON RE-OR- 
GANIZATION AND DIVISIONAL STATUS - - 


237 


255 


283 


ii 


ili 


VOLUME TEN DECEMBER 1945 NUMBER FOUR 


4 
é 
§ 
4 
231 
— 
i 
‘% 
— 
308 
{ 
- 
i 
ia 
q 
: 


PROFESSOR CHARLES SPEARMAN 


(Reprinted by permission from the Journal of Consulting Psychology 1941, 5: fac- 
ing page 97). 


4 
j 
: 


PSYCHOMETRIKA—VOL. 10, NO. 4 
DECEMBER, 1945 


SPEARMAN AS I KNEW HIM 


KARL J. HOLZINGER 
UNIVERSITY OF CHICAGO 


The Psychometric Society owes Charles Spearman (1863-1945) 
a great debt of gratitude because, as our new president Cureton re- 
cently remarked, there could hardly have been any such society with- 
out Spearman’s fundamental work in statistical psychology as its 
foundation. He was not only a great scientist but also one of the most 
charming gentiemen I have ever met. 

In 1922 I acquired a degree, a little money, and an intense desire 
to study with two men who were undoubtedly the greatest of their 
time in statistics and statistical psychology, Karl Pearson and C. 
Spearman. Upon arriving at the University of London, I presented 
myself to Karl Pearson with the statement that I was there to learn 
statistics from him and statistical applications from Spearman. Pear- 
son was a keen, aristocratic man with not too great tolerance for in- 
ventive statistics beyond that originating in his laboratory. My state- 
ment about studying with Spearman was greeted without enthusiasm, 
and I was also shortly to learn that attendance at the Galton Labora- 
tory was expected from 9:30 A.M. to 6:30 P.M. with a short time off 
for lunch and also for tea at 4:30, the latter referred to by American 
- students as the “Silent Hour.” 

Studying with Spearman thus became something of a problem, 
but lunch hours could be prolonged a bit, and Saturday afternoons, 
Sundays, and evenings were still available. When I met Spearman 
for the first time, I was charmed by his manner and enormously flat- 
tered that he cared to bother with a mere student at luncheons, din- 
ners, and cafes, where we talked statistics and applications to psy- 
chology. My great thrill came when he asked me to work with him on 
the tetrad formulas, and for once in my life I omitted sleep entirely 
so as to present him with something the next morning. I think this 
willingness to “collaborate” with a youngster on a statistical prob- 
lem he could easily have worked alone illustrates the bigness of the 
man. I had travelled a long way to study with the master, and he 
was willing to provide me with the greatest stimulation a student 
can obtain. Pearson did the same sort of thing, but he was much 
more likely to work out the problem for you and hand vou the answer 
the next morning. This procedure was obviously not so much fun. 


231 


| 
A 
i 
if 
a 
|| 


232 PSYCHOMETRIKA 


My next personal contacts with Professor Spearman occurred 
when he came to the University of Chicago as a visiting professor 
and in connection with the Unitary Traits Committee under the chair- 
manship of Professor E. L. Thorndike. Spearman loved America and 
Americans and accepted without adverse comparisons and with 
enthusiasm our customs and way of life. 

On one of his visits I tried to lodge him at a faculty club, but he 
preferred to stay at International House, where he could meet not 
only Americans but also a gay cross section of students from many 
other countries. Here he became as always a general favorite and in 
the lounge and dining-room was surrounded by groups charmed by 
his enthusiasm, friendliness, and ready wit. 

During one of his other visits to Chicago, while my family was 
away, he decided to “batch” it with me and demonstrate his skill in 
cooking omelettes. My duty was coffee and toast, while he was sup- 
_ posed to provide eggs in various forms. The fact that they all came 
out scrambled left him undisturbed and amused. We did not talk sta- 
tistics in the morning, but discussed at length such topics as the merits 
of settling baseball arguments by hitting umpires, the antics of Chi- 
cago gunmen, and the parts of America he wanted to see. 

Spearman completely and unconsciously refuted the proverbial 
idea of the Englishman who never saw the fine points of an amusing 
situation or incident. On one occasion a car full of professors, includ- 
ing Spearman, violated some traffic rule, and an Irish policeman asked 
us to “pull over.” We told him humbly that we were very sorry and 
tried by flattery and bribery to get in his good graces. Finally the 
policeman said, “You can get along now, but don’t try to excuse your- 
selves because you were careless. The trouble with people like you 
is you just don’t care.” We drove away stunned, and some minutes 
later Spearman, who had been silent during this by-play, said with 
a twinkle in his eye, “For an Irish cop, the distinction made between | 
being careless and just not caring was extremely refined.” 

In the early 1930’s the Unitary Traits Committee had a number 
of meetings that eventuated in subcommittees for the study of fac- 
torial problems. Professor Spearman and I were assigned to do some 
experiments at Chicago while he was a visiting professor. Spearman 
provided over a hundred tests of various types for these experiments, 
as well as the general plans and funds of his own for the studies. 
After his return to England a number of Preliminary Reports were 
prepared at Chicago in anticipation of a simmary report by Spear- 
man at a later date. The onset of World War II and Spearman’s serv- 
ices as a test expert held up this project to the time of his death. 

I should like to turn next to Spearman’s contributions to statis- 


KARL J. HOLZINGER 233 


tical psychology. Chief among these works are the fundamental ideas 
involved in the application of correlation methods to psychological 
data in his famous paper of 1904, “General Intelligence Objectively 
Determined and Measured ;” correlation of sums and differences; the 
1914 paper including a formula for obtaining common-factor weights; 
his various formulas for correlation of psychological data based on 
ranks ; and his numerous important papers on the Two-Factor Theory 
of Intelligence. 

The paper on correlation of sums and differences is of course the 
basis of all the modern theory of treating fallible data. Among the 
numerous applications by Spearman himself are the formulas for 
attenuation and the Spearman-Brown prophecy formula. There have 
been scores of other theoretical and practical applications of this im- 
portant formula by many psychologists. 

Spearman’s greatest statistical contribution to psychology, and 
to statistical theory as well, is his monumental work of factor theory. 
He is first of all to be credited with the important idea of explaining 
intellectual traits not in terms of the gross scores themselves but as 
underlying factors resolvable from the scores. A factorial solution as 
we now describe it consists of two parts: first, a linear description 
of the tests in terms of underlying common and unique factors, which 
is known as a pattern; and second, the estimation of factor scores for 
individuals, resulting from the first analysis. For both of these as- 
pects of the solution some form of the method of least squares (or its 
equivalent) is employed. All these ideas were contributed by Spear- 
man in his Two-Factor Theory. In addition to all the above statistics, 
he provided satisfactory sampling tests for the adequacy of his two- 
factor solution. His method is thus not only highly original but is as 
complete as anything invented up to this time. 

Spearman’s interpretation of a set of mental variables as due to 
a single general factor “g” and a number on specific factors “s” is 
familiar to all students of psychology, but an understanding of what 
Spearman meant by “g,” and what many others think it means, has 
led to confusion, doubt, and a variety of mystical, statistical, and psy- 
chological interpretations. Many of these confusions arose from early 
ignorance of the nature of scientific method. It was often thought 
that if a man had a psychological theory such as Spearman’s Two- 
Factor Theory and subsequently verified it within the limits of sam- 
pling error, this had to be the one and only interpretation of the psy- 
chological data. We now know this to be absurd. It was also thought 
that since “g” was a single letter in a mathematical equation it had 
to be something pure and indivisible. Twenty years ago I had a pub- 
lic debate with a psychologist about whether “g” could be a single 


| 
an 
\ 
| 
ul 
He 
ar! 

i 


234 PSYCHOMETRIKA 


unique mathematical entity and at the same time something very com- 
plex psychologically. In the Unitary Traits Committee of 1931-34 we 
talked about “pure” factors like chemical elements, but since the 
atomic bomb perhaps this foolish idea may also be exploded. Another 
common misinterpretation of “g” held by some today is that any set 
of variables satisfying the tetrad relationships (rank one in modern 
terms) must yield an intellectual general factor. With the application 
of factor analysis to any set of variables this idea is also of course 
absurd. 

Spearman was one of the first to distinguish between a psycho- 
logical theory and a statistical interpretation of it. He talked fre- 
quently and wrote on this point with a much clearer view in my opin- 
ion than several modern factor analysts. Of course, Spearman was 
interested in the general factor and in a sense “explained away” the 
group factors, but he was well aware of the existence of the latter. 
For the kind of tests available at the time he proposed his theory, 
the two-factor explanation was adequate. With the invention of much 
more varied types of mental tests, group factors came into greater 
prominence after 1930, and a number of American and English psy- 
chologists invented factorial systems to account for such group fac- 
tors. Being psychologists, however, these workers had their attention 
focussed on the interpretation of their current interest, group factors, 
and began to “explain away” the general factor. 

One such psychologist passed over the general factor in one book 
as “maturity, race, sex, heterogeneity, etc.” but a few years later he 
pledged himself to the method of principal components which takes 
out all these and something more in the first factor. Another psy- 
chologist has extended Spearman’s statistical theory to a type of 
group factors which are defended because as compared with all other 
systems they are most psychologically meaningful, and they behave 
as nature does. In this case Spearman’s general factor is admitted 
finally but called a “second-order factor.” An English psychologist 
and I have also evolved a method which retains Spearman’s general 
factor but adds group factors as well. Most factor analysts now agree 
that all these systems are statistically admissible, that they may be 
transformed into one another, and that some form of the general fac- 
tor is necessary. 

I should like to emphasize again the importance of Spearman’s 
statistical contributions to factor analysis compared with that of mod- 
ern analysts. As pointed out, he provided the complete scheme and 
method for a single general factor and unique factors. Recent factor 
analysts have brought the group factors into prominence by borrow- 
ing many mathematical ideas such as matrix theory and geometry 


> 


KARL J. HOLZINGER 235 


that lend to their findings an air of scientific elegance. By leaning 
heavily on the mathematicians, they can produce a barrage of for- 
mulas that are very impressive. By deft phraseology they can argue 
for one system over another, but no one in my opinion did a better or 
more complete job both statistically and psychologically than Spear- 
man in his Two-Factor Theory, and none of the modern work would 
have been possible without the firm foundation he provided. In the 
field of mental ability, the general factor does emerge in one form or 
another after awhile. It may be stoutly denied for a time or explained 
away as the correlation amongst factors, but sooner or later it ap- 
pears as a very convenient form of interpretation. Spearman con- 
tributed not only the basic methods of factor analysis but also a 
general factor that has come with us to stay. 

Spearman will long be remembered as one of the outstanding 
psychologists of all time and as a truly fine and great man. He was 
an admirable teacher with whom to study and a stimulating man 
with whom to work. The onset of the war and the loss of his avia- 
tor son at Crete saddened his last years, but for the most part his 
long life was not only brilliantly successful professionally but also 
an enviably happy one because of the breadth and warmth of his 
human understanding and relationships. 


| 
3 
4 
{ 
| 
i 


PSYCHOMETRIKA—VOL. 10, NO. 4 
DECEMBER, 1945 


THE PREDICTION OF CHOICE 
L. L. THURSTONE 
UNIVERSITY OF CHICAGO 


This paper is concerned with a central concept in social meas- 
urement such as opinion polls, the measurement of attitudes, the 
prediction of political elections, the measurement of moral values, 
the measurement of consumer preferences, the measurement of util- 
ity, and the measurement of aesthetic values. The goneept is that 

of the discriminal dispersion and its interesting effects in the *predic- 
\= of choice. 


We shall describe first some special cases to illustrate the nature 
of the problem and we shall develop several psychophysical theorems 
in the prediction of choice and a method of computation. 


FIGURE 1 


As an example, let there be six psychological objects which may 
be as many candidates for elective office. To simplify the situation 
in this special case let all of the six candidates be equally popular, on 


the average, so that their scale positions on the subjective continuum | 


are all the same, namely, S;, as shown in Figure 1. Let us assume 
that candidate A does or says something by which a thousand people 
become his enthusiastic supporters and that, at the same time, an- 
other thousand people hate him. Let this process continue in such a 
manner that the first candidate attains a large discriminal disper- 
sion whereas the other candidates, B, C, D, E, F, retain their small 
discriminal dispersions. It is assumed in this example that the mean 
subjective values S; remain the same during the campaign. 

Before considering what will happen at the election, let us sup- 
pose that the six names are presented to the subjects in pairs where 


237 


— 


if 

ACOEF 

a 

— 

| 

— 

— 

| 

| 


238 PSYCHOMETRIKA 


each candidate is paired with every other candidate. Such a list con- 
tains n(n — 1) /2 = 15 pairs of names. The voters check their pref- 
erences for each pair of names. From such returns we can make a 
square table of proportions of judgments, p;,.,, to show the propor- 
tion of the voters who prefer j to k for every pair of names. In the 
present situation we should find that all of the proportions would be 
exactly the same, so that p;., = .50. In other words, the table of pro- 
portions of preference would show all of the six candidates to be 
equal in average popularity since their scale positions S; are all the 
same. 
Now, having the relations of Figure 1, or their numerical equi- 
valents, the problem is to predict what the voters will do when they 
record their first choices for the six names. From the diagram, it is 
evident that the first candidate will get half of the votes and that the 
other five candidates will divide the other half. Hence we should ex- 
pect the following division of the votes: .50; .10; .10; .10; .10; .10. 
The first candidate would have at least a plurality. The reason for 
the apparent discrepancy between the table of proportions for paired 
preferences and the prediction of first choice is in the great differ- 
ences in the discriminal dispersions for the six candidates. The ef- 
fect that has been described in this example can be summarized in 
the theorem that if three or more psychological objects with the same 
average affective value and with symmetric dispersions are compet- 
ing for selection as first choice, then the object which has the largest 
discriminal dispersion will obtain the largest number of votes. The 
effect which is here stated in terms of psychophysical concepts is no 
doubt known to practical politicians and perhaps to advertisers and 


FIGURE 2 


to students of consumer preference. With the development of a theo- 
retical structure for psychological measurement from the more re- 
stricted methods of traditional psychophysics we should have frequent 


k J 
i 
L | 


L. L. THURSTONE 239 


contacts between theory and the intuitions of experience in practical 
affairs if our scientific rationalizations of social phenomena are sound. 

Another example will now be described. Let Figure 2 represent 
the frequency distributions of two leading candidates, i and j, and 
let them have the same average affective value S; = S;. Assume that 
these distributions are Gaussian on the subjective continuum with 
different dispersions so that s; > 0; as shown in the figure. In this 
situation we should expect that the two candidates would draw the 
same number of votes so that it would be a matter of chance which of 
them actually became the winner. Now let us introduce a dark horse 
in the third candidate k whose scale value S; is lower than that of 
the two leading candidates. This situation can be considered under 
two cases, namely, (1) the case in which the distributions of 7 and 
k do not overlap, and (2) the case in which the distributions 7 and k 
do overlap. The first case is shown in Figure 2. 

Before the candidate k is introduced, the expectation is a tie be- 
tween candidates i and 7. After the introduction of candidate k , the 
expectation remains unaltered because the entire distribution for j 
exceeds the entire distribution for k. Hence with no overlapping of 
j and k, the expectation is that candidate k gets no votes at all. The 
expected vote is then the same as if the third candidate had not been 
introduced. i 


Case 2 is different as shown in Figure 3. Here there is an over- 


lap between the less variable j of the two leading candidates and the j 
candidate k. Most of those who perceive i in the lower half of the | 


affective range have the preference 7 > i. If some of them shift 


their first choices to k, the result will be a larger decrement in the . 


votes for 7 than for i. Therefore the expectation is that i will have 
a plurality. We can summarize this situation in several theorems. 


5. 
FIGURE 8 


If two candidates have equal average affective values (S; = S;) and 
different dispersions (0, > «;), and if there is a third candidate with 


- 
; 
J 
| 
j ; 
j 


240 PSYCHOMETRIKA 


lower scale value and overlapping dispersion, there will be a plural- 
ity for the more variable of the two leading candidates. In the case 
of a threatened tie between two leading candidates, the more variable 
of the two comdidates can win the election by introducing a less popu- 
lar candidate. This is a curious result. Perhaps this effect is also 
known to practical politicians. A limiting case is that in which the 
average popularity (S,) of the third candidate is equad to that of i 
and j. Then, if i is still the most variable in affective dispersion, he 
wins the plurality. This case is covered by the first theorem. These 
theorems are not intended to be exhaustive. They are presented pri- 
marily for the purpose of indicating the potentialities of the concept 
of discriminal dispersion in the prediction of conduct. The subject 
sketched by these examples can be extended analytically into one of 
considerable proportions. 

In voting for a first choice in a set of names, physical objects, 
or ideas, the subject sometimes encounters one or more objects that 
are complete strangers to him. He may even omit such candidates 
from consideration. In listing the distributions for computation of 
first choice we have then a reduced frequency for the active prefer- 
ence votes for this object. The residual frequency can be recorded at 
any point on the affective continuum below the active distributions 


of the other candidates. 


Analytical Method 
We turn now to a more formal consideration of the problem. A 
method of computation will be described which covers both of two 
general cases, namely, (1) that in which the affective dispersions 
are Gaussian, and (2) that in which the affective dispersions take 


| 
4s 
5 


FIGURE 4 


: 
‘ 
i 
4 
4 
f 
j 
JS 
- 
4 


L. L. THURSTONE 241 


any form, including bimodality which is characteristic of the latter 
phase of a political campaign. The computations will normally be 
based on experimental data obtained by the method of successive in- 
tervals. This psychophysical method wil! be described in improved 
form in a separate paper. 

We begin the analysis with two overlapping Gaussian distribu- 
tions on the affective continuum as illustrated in Figure 4. These two 
distributions represent two stimuli, i and 7, with scale values S; and 
S; and with discriminal dispersions o; and o;. The distributions will 
‘be assumed to be drawn with unit area so that their ordinates may 
be interpreted directly as probabilities with any suitable class inter- 
vals. The probability that any percipient will experience stimulus i 
at any specified affective value s is then the ordinate of the probabil- 
ity curve yj, at the point s. The probability that the same percipient 
will experience stimulus j at some value lower than s is then 


Dis = [im ds, (1) 


which is represented by the shaded area of Figure 4. Assuming that 
these probabilities are independent, we have 


Py; = f Yis Pis ds (2) 


where P;,; is the probability that i will be perceived higher than j, 
irrespective of where i is perceived. The values of P;,; are, in fact, 
the experimentally given yalues in the method of paired compari- 
son in which each stimulus is compared separately with every other 
stimulus. This method implies n(m — 1) /2 comparisons if each stim- 


k 


a 

¢ 
Pas | : 
FIGure 5 
Pin 


242 PSYCHOMETRIKA 


ulus is not compared with itself and if the space or time orders ij 
and ji are not differentiated experimentally. When that is done, we 
have n(m — 1) comparisons, and when we include the comparison 
of each stimulus with itself, as is possible when the stimuli are de- 
fined in such a manner that their individual identities are not recog- 
nized by the subject, then we have n? comparisons for the method of 
paired comparison in its complete form. When only one of the stimuli 
is chosen as a standard for comparison with each of the other stim- 
uli, then we have the constant method with n or (n — 1) compari- 
sons. The constant method of traditional psychophysics is a special 
case of the more fundamental paired comparison method. The analy- 
sis so far is that of the law of comparative judgment which has been 
described in previous publications, but-we are here concerned with 
the obverse problem of predicting behavior when S; and o; are known. 

Let us now consider a-set of three overlapping affective distri- 
butions. Let the three stimuli be denoted 7, 7, and k as shown in 
Figure 5. As before, the probability that i will be perceived at s is 
yi,. The probability that 7 will be perceived below s is p;,, which is 
the single cross-hatched area. The probability that k will be per- 
ceived below s is p., which is the double cross-hatched area. Assum- 
ing, as before, that the probabilities are independent we have 


Pi.= f Yis Dis Dis GS , (3) 


where P;, is the probability that the stimulus i will be selected as 
first choice when all three stimuli are presented. If there are N indi- | 
viduals we should have 


NP,,=E;,, (4) 


where E;,, is the expected number of votes for stimulus 7 when all 
three stimuli are presented for selection of one stimulus by each sub- 
ject. 

Instead of dealing with continuous distributions we can restate 
the same relations in summational form for a set of frequencies in 
successive intervals. Let the affective continuum be divided into s 
successive intervals 1,2,3,---, m,--“8, where m denotes the gen- 
eral interval. It should be noted explicitly that these intervals need 
not be equal, as is illustrated in Figure 6. In fact, the method of 
successive intervals is a psychophysical method in which the inter- 
vals are not ordinarily equal. Their magnitudes can be determined 
for each set of observations. However, the problem of prediction of 
choice can be solved without even knowing the relative sizes of the 
successive class intervals. 


‘ 
| 
| 
| 
d 


L. L. THURSTONE 243 
FIGURE 6 


Let fi, denote the frequency with which the stimulus 7 is per- 
ceived in the interval m so that 


(5) 


where N is the number of experimental subjects who perceived and 
classified the stimulus i. The corresponding relative frequency is 
then 

fig = (6) 
N Yim 
in the class intervals m , so that yim is the probability that the stimu- 
lus 7 will be perceived in the affective interval m. 

Let the stimulus i be the one about which we want to predict the 
number of first choices. Let each of the other stimuli be denoted k. 
Then P;., will denote the proportion of subjects who vote for the 
stimulus i as their first choice in comparison with the whole group k. 

The probability that any one of the stimuli k& will be perceived 


below any specified class interval m is then <inss. vee m~\ 
mar 
= = = Dr<m (7) 


t=1 


where ¢ denotes successive intervals. The summation is here over 
(m + 1) intervals, since P,<», denotes the probability that any stimu- 
lus k will be perceived below the interval m. Since we want to deter- 


t 
i 
m 
oF 
id 
3 
= 
a 
a 
x / 
FIGURE 7 q 
‘4 


244 PSYCHOMETRIKA 


mine the probability that the stimulus i will be perceived higher than 
all of the stimuli in group &, we must deal with the whole group k. 
The probability that all of the (n — 1) stimuli k will be perceived 
below any designated class interval m is the product of the (n — 1) 
probabilities Piim—+). Denoting this product um.) we have 

= *** Px (m-1) Den-1)(m-1) = IT (8) 

If we take the product yin u(m — 1), we have the probability that 
the stimulus 7 is perceived in the interval m and that all-of the stim- 
uli in group k are perceived below the interval m .(To sum this prod- 
uct for all the intervals would not give the desired answer because 
when the stimulus i is perceived in interval m, it may exceed one 
or more of the stimuli & which are also perceived in the same interval. 
To cover this situation consider Figure 7. 

In Figure 7 we have a class interval which is here given the 
range 0 to 1 as shown. Let yi» be the probability that stimulus 7 is 
perceived in the interval m. This ordinate is drawn in the figure as 
in a histogram so that we are assuming a rectangular distribution 
within each interval. The ordinate ~,, is the probability that all of 
the stimuli in group & are perceived below the top of the intervalm . 


Then 


n-1 


Um = Dim * Dom * Dam *** Pkm*** Din-1ym = IT Pim. (9) 
k=1 


The corresponding probability for the bottom of the same class in- 
terval is given by equation (8). Let x be any point in the class inter- 
val m as in the figure. The probability that stimulus 7 is perceived 
at the point z is (yi, dx) and the probability that, at the same time, 
all of the stimuli in group k are perceived below x is (Um ‘ XLPo), 
where # 

Do = (Un — Um-1)- (10) 
Hence we have 


Poss — J Vim + xD.) dx = Yim + 4 Yim Po (11) 


which is the probability that 7 is perceived i in the interval m and that 
all of the group k are perceived below i. The desired probability P,., 
that i will be an individual voter’s first choice is the summation of 
Pin>, for all intervals m. Expressing p, in terms of the u’s, we have 


Din Yin (Um + Una) (12) 


m=1 


{ 


L. THURSTONE 245 


This summation can also be written in the form 


8 8 
[Yim + Un = 4D Zim Un, (13) 

where 
Zim = Yin Yi (m+) (14) 


It should be recalled that this prediction of first choice for any 
given stimulus i can be made even though the class intervals remain 
unknown as to their relative magnitudes. Further, this computing 
formula has no restriction as to the shapes of the affective distribu- 
tions. Several of them might be skewed while others are bimodal. 
A reservation about this computing formula is that it assumes inde- 
pendent probabilities. In dealing with affective discrimination for 
a large class of psychological objects this assumption is valid. If the 
affective deviations of a pair of objects are correlated, then the analy- 
sis becomes more complex. It can be made in terms of experimental 
data for the method of successive intervals. When the psychophysical 
problem is concerned with repeated judgments by the same indi- 
vidual, then the assumption of independence is valid except when the 
experimental situation introduces affective contrast or related effects. 
' These can usually Be avoided with good experimental procedures. 


; Numerical Examples 
In Table 1 we have tabulated the distributions for three stimuli 
on the affective continuum as they might be found from the experi- 
mental method of successive intervals. Here we have chosen arbi- 
TABLE 1 


is 
PRES Sis 


TA 3 
= 
= 
te 


06. 12 14 

02 88 1.00 .8800 .6900 6072 .30 .18 
19 .08 00 96 1.00 .9600 .8800 8448 .81 12 
m2 .04 00 1.00 1.00 1.0000 1.0000 1.0000 .12 .04 

2P 

1 .9516 47 
2 .6029 30 
3 23 


: 
a x: 
q 
.02 
16 
48 
. 68 
48 
16 
02 
.00 
a 
= 
2.0154 1.00 
i” 
j 
i 
i” 
\ 


246 PSYCHOMETRIKA 


trary bimodal and skewed distributions to illustrate the latitude of 
the method. In this problem it is not necessary to evaluate the rela- 
tive sizes of these intervals. They are denoted merely as successive 
intervals from 1 to 9, inclusive. The number of intervals is arbitrary. 
The probabilities Yim, Yom, Yam, are listed as shown. Next we com- 
pute the corresponding values of p;,,. These summations are made 
from the lowest intervals so that the value of p;, for m = 9 is unity 
for each stimulus 


The values of ~;», are next recorded. For example, the value for | 


U», in the interval m = 2 is PomPsm = .16 X .02 = .0082. In the next 
three columns we list the values of z,,. For example, the entry 23» for 
m = 5 is (.84 + .14) = .48. 

The final step is to sum the cross products 2jmUjm. The sums 
give values of 2P;,. These are listed separately in the table. The 


TABLE 2 
Yo Ys P; Ug Uy % 23 
00 00 .00 00 0000 =.0000 0000 .01 .00 .00 
00 06 00 0000 .0000 0000 .01 .00 .00 
00 00 00 0900 =.0000 0000 .01 .00 .O1 
.00 .01 .02 01 0000 0002 0000 .038 .00 .02 
00 .06 


= 
R 


00 
01 
00 
01 
M2 .00 
03 .00 
04 .01 
84 
OT 05 19 .23 50 1150-0161 «1488 
10 08 09 19 31 .16 69 .1104 .0496 .17 .24 .34 
12 19 09 .50 .93 .4650 .4650 .2500 .20 38 .14 
13 «4.10 «4.19 05 60 69 .98 .6762 .5880 .4140 .19 06 
14 09 15 01 .69 £84 .99 .8316 .6831 .5796 .17 .24 .02 
146 05 00 84 .98 100 .9800 .8400 .8282 12 .06 .00 
17 05 .01 00 89 99 100 .9900 .8900 .8811 .09 .02 .00 
18 04 01 00 93 1.00 1.00 1.0000 .9300 .9300 .07 .01 .00 
19 03 00 00 .96 1.00 1.00 1.0000 .9600 .9600 .05 .00 .00 
20 00 00 .98 1.00 1.00 1.0000 .9800 .9800 .00 .00 
21 .01 00 00 .99 1.00 1.00 1.0000 .9900 .9900 .01 .00 .00 
22 00 00 00 9 1.00 1.00 1.0000 .9900 9900 .01 .00 .00 
23 «64.01 00 00 1.00 1.00 1.00 1.0000 1.0000 1.0000 .01 .00 .00 
24 .00 00 00 1.00 1.00 1.00 1.0000 1.0000 1.0000 00 00 .00 
2P P 
9634 48 
j 2 9109 45 
k 3 1327 07 
2.0070 1.00 


= 
| 
‘ 
4 
4 
A 
{ 


L. L. THURSTONE 247 


result gives 47% of the votes for the first stimulus, 30% for the sec- 
ond stimulus, and 23% for the third. An adjustment of less than 1% 
is indicated in the summation. This discrepancy is due to the as- 
sumed linearity of the function uw within each class interval. It is 
small for a large number of class intervals. 

In Table 2 we have a numerical example of the theorem that 
when two psychological objects are tied in average popularity, as 
measured by the mean scale positions S; and S; , then the more vari- 
able of them can win election for first choice by the introduction of 
a third competing object of lower average popularity. Here we used 
24 successive intervals. All three of these affective distributions were 
made Gaussian, and it is here assumed that the distributions are at 
least roughly symmetric. The first two candidates are the leading 
ones that are tied. The third candiate has a lower average popularity 
as shown in the columns Yin, Yjm, Ym. The computations are simi- 
lar to those of the naan example and we have the following re- 
sults: 


Expected votes for only two 


candidates: 50% 37 =50% 
Expected votes after introducing new 
candidate: 4= 48%; 7= 45%; k=T%. 


Here the more variable candidate i has obtained a plurality by in- 
troducing a new candidate with lower average popularity. 

The examples have been limited to groups of two or three ob- 
jects but the theory can be extended to groups of any size. For large 
groups, the computational procedure can be rearranged in a more 
economical manner. 


Implications of Discriminal Dispersion , 

While the examples of this paper have referred to political elec- 
tions, the psychophysical theory is applicable to the comparison and 
ranking of psychological objects of _any kind. Since it is a major 
purpose of this paper to indicate some analytical and experimental 
implications of the concept of discriminal dispersion, it may be in 
order to mention briefly its relations to psychological measurement 
of several kinds of objects. 

First, consider the formal laboratory measurement of sensory 
and perceptual functions. In measuring pitch discrimination we can 
get along with the traditional psychophysical methods if we use stim- 
uli that are carefully controlled so as to be quite homogeneous in all 
but the attribute to be discriminated. But suppose we want to study 
pitch discrimination under wide variations in timbre, including 


q 
| 
| 


248 PSYCHOMETRIKA 


noises. Then the stimuli are not homogeneous in other attributes and 
they will almost certainly differ widely in discriminal dispersion. It 
would still be possible to measure pitch discrimination under such 
variant conditions and the individual differences so found might be 
of considerable psychological and physiological interest. The older 
psychophysical methods would then be inadequate. The discriminal 
dispersions as well as the scale values can be determined from com- 
plete paired comparison data if the stimuli cover a supraliminal range 
in pitch. 

In the measurement of social attitudes of a group it is not only 
the average affective value of a proposal or idea that is of significance 
but also the dispersion of affective values within the group. It may 
even be possible to define the morale of a group in terms of the sum 
of affective dispersions of all its debatable issues. The effects of pro- 


. paganda are no doubt determined in part by the heterogeneity of af- 


fective values which are themselves to be altered by propaganda. 
Moral values are essentially affective in nature. The moral code of a 
social group can be described by its affective values in which the 
highest ones are what the group considers to be sacred. Measurement 
of the seriousness of crimes can be made by psychophysical methods 
in which the dispersions are signs of heterogeneity or lack of unity 
in the group and its code. 

Studies have been made of international attitudes by asking the 
subjects to rate their preferences for nationalities in pairs. If some 
of the subjects are give’: the question “Which of these two national- © 
ities would you rather associate with?” and if others are asked the 
question “Which of these two nationalities would you rather have 
your sister marry?’, the results will be essentially the same as re- 
gards rank order with a linear relation between the two sets of scale 
value3’ for the nationalities. But the dispersions will be widely dif- 
ferent, showing greater discrimination for the second question than 
for the first. This is what we should expect. It is another example 
in which social judgments are represented in part by the discriminal 
dispersions with effects that are not always so obvious as in this ex- 
ample. 

International attitudes can be studied with the psychophysical 
methods. This was done with newspaper editorials for the period 
1910 to 1930, which included the period of World War I. The analy- 
sis showed the rate of decline of editorial attitudes toward Germany 
and the rise of attitude toward France and the later recovery of pre- 
war attitudes. It seems likely that if such studies were made in a 
manner to reveal the group dispersions as well as the scale values, 
the result might be a rather sensitive barometer of increasing het- 


i 
4 


L. L. 249 


erogeneity in international attitudes. Rapid changes in scale ee 


psychophysical methods that yield central tendencies and group dis- 
persions as two parameters for each object. The experimental meth- 
ods are easily used and they can yield predictions of relative con- 
sumption of competing commodities. 

In the measurement of utility the psychologists and the econo- 
mists are dealing with overlapping problems. The utility concept is 
essentially the same as that of mean affective value for the individual. 
The addition of a parameter for dispersion could lead to interesting 
results for psychological and economic theory. Consider, for example, 
a surface whose base co-ordinates represent amount of two commod- 
ities and whose ordinates represent the associated utility or affec- 
tive value for the individual. Horizontal sections of this surface give 
a family of indifference curves whereas vertical sections, parallel to 
either base axis, give satisfaction curves which are interpreted as 
Fechner’s law. Now, if the dispersions in affective values among the 
individuals are introduced as new parameters, we have the possibility 
of summing the effects for the individuals to that of the group. Fur- 
ther, the first derivative of the satisfaction curve at any point is the 
motivation of the individual with reference to the commodity con- 
cerned. 

The application of methods in in experimental aes- 
thetics is well known. The addition of a parameter for group dis- 
persion would be indicative of the heterogeneity of aesthetic criteria 
for each object. Aesthetic theory is often regarded as a subject to 
be settled by scholarly debate or by reference to Aristotle. It would 
be better to test aesthetic theory by reference to experimental meth- 
ods that are available. We could then find out where Aristotle guessed 
right and where he guessed wrong. Here we are assuming that aes- 
thetics is not normative. We assume that the aesthetic value of an 
object is determined by what goes on in the mind of the percipient. 
What is an aesthetic object for one percipient is prosaic to another. 
In a homogeneous culture, it should be possible by experimental means 
to describe, and even to measure, those object attributes which have 
aesthetic value for most individuals in that culture. 

The purpose of the psychophysical methods is to allocate each 
one of a set of psychological objects to the subjective continuum 
which is also called the discriminal continuum. Measurement in this 
continuum is effected with a subjective unit of measurement, namely, 
the discriminal error, with criteria for internal consistency that dif- 
ferentiate measurement from mere rank order. The psychological ob- 


The measurement of consumer preferences should be done by : 
a 


250 PSYCHOMETRIKA 


jects may be any objects or ideas about which the subject can make 
comparative judgments in the form “A is 2’er than B,” where z is 
any designated attribute. When each psychological object 7 has been 
allocated, it is described by two parameters, namely, its mean scale 
position S; and the discriminal dispersion o; which the stimulus pro- 
jects on the subjective continuum. The purpose of this paper has 
been to consider the obverse problem of predicting the behavior of 
the subjects in terms of these parameters. In particular we wanted 
to predict the relative frequency with which the subjects select any 
designated object as their first choice. 

William James said that psychophysics was the dullest part of 
psychology. He was right. But if we extend and adapt the psycho- 
physical concepts to the theory of discriminatory and selective judg- 
ment, the subject takes a different color and it is no longer dull. 


The Case of Prediction with Correlated Ratings 

In the previous sections of this paper the writer has presented 
several psychophysical theorems concerning the problem of predict- 
ing how many people will select any given stimulus as their first 
choice when the scale values and discriminal dispersions are known 
for each stimulus. Those theorems were written with the explicit 
reservation that they assumed independent probabilities, i.e., they as- 
sumed the scale values of the stimuli to be uncorrelated. In practice 
it is to be expected that the scale values will be correlated for many 
types of preferential judgments, and it is desirable, therefore, to de- 
vise methods of prediction of choice that are free from this restrict- 
tive assumption. Here we shall describe a method of prediction .of 
choice which is independent of the shapes of the affective distribu- 
tions and also independent of their intercorrelations. 

The problem is, then, to predict the proportion of voters (or 
buyers) who will select any particular psychological object (person, 
idea, or thing) as their first choice in a group of such objects. It will 
be assumed that there is available a random and representative sam- 
ple of individuals from the population about which the prediction is 
to be made. It will be assumed that we want to determine the scale 
values and discriminal dispersions of a large collection of objects but 
that the actual presentation to the total population will be a smaller 
group of such objects. Let the large collection contain N objects or 
stimuli which will be referred to as the collection. Let the smaller 
group to be presented to the total population contain u objects which 
will be referred to as the group of stimuli. The reason for this for- 
mulation is that it may be desirable to select the group of stimuli for 
selection by the population on the basis of information concerning 


| 
< 
| 


L. L. THURSTONE 


relative popularity and dispersions of a large collection of objects. 
When these are known, the smaller group can be selected with fore- 
sight as to the desired objectives. For example, the stimulus collec- 
tion may consist in a set of principles and we may want to predict 
which set or group of these principles can be combined into a pro- 
posal that the population will accept with a majority endorsement. 
This requires a labeled neutral point in the method of successive in- 
tervals. Or the stimulus group may consist of ten neckties whose 
patterns and colors should be so selected from a large collection as to 
maximize acceptance by the population. The scale values and disper- 
sions may be examined to select the most appropriate smaller com- 
bination which shall constitute the stimulus group for general pre- 
sentation. 
a If each individual. may select one necktie, then the group of neck- 
) ties should be so assembled from the available collection as to maxi- 
mize the scale values of those which are rated as first choices. This 
solution is not the same as that of assembling a group of stimuli with 
the highest scale values. Such a group may please a number of indi- 
viduals with more than one stimulus of their favorite kinds while 
others remain disappointed. The problems in this area are not only 
of theoretical interest in the adaptation of psychophysics to practical 
affairs. Their solutions may also bear interesting relations to prob- 
lems of. social theory. 

The experimental procedure is to ask each subject to place each 
stimulus in one of seven successive categories. The procedure is su- 
perficially similar to that of the method of equal appearing intervals 
but there is an important difference in the instructions to the subject. 
In the method of equal appearing intervals, the subject is asked to 
place the stimuli in the several piles so that they seem to him to be 
about equally spaced as to the attribute in which the stimuli are being 
rated. It can be shown experimentally that the subject is unable to 
carry out such instructions and that in fact the intervals toward the 
ends are greater than the intervals in the middle range. In the meth- 
od of successive intervals, we do not impose any such restriction. 
The successive intervals are either given successive numbers, 1, 2, 3, 
ete., or they are given descriptive names. When the successive classes 
are ‘identified by descriptive phrases, the wording should be carefully 
done so that the subjects accept immediately the successive nature of 
the categories. If the subjects object that the labels on the successive 
classes aré out of sequence, then the experimental procedure fails. It 
is usually rather easy to designate the successive categories in such a 
way that their order is accepted by all of the subjects. They then 
proceed to place each stimulus in one of the classes according to the 


| 
f 


252 PSYCHOMETRIKA 


labels without any restriction that the intervals shall be in any sense 
equa]. There is of course no restriction as to the shape of the distri- 
bution of judgments in this successive interval classification. Occa- 
sionally, students of psychology need to be reminded that the normal 
distribution of categories has nothing to do with this case. 

The computational procedure is rather simple. For each subject 
we record the particular stimuli that he rated in the highest category 
which he used. In addition we record the number of stimuli in that 
category. This category need not be the highest in the set because a 
subject may leave blank one or more of the upper categories. It is the 
highest category in which he rated any stimuli. 

From these basic data we tabulate n;,, which is the number of 
subjects who placed stimulus 7 in the highest category in a group of 
m stimuli that were rated in the same category. The highest cate- 
gory is not necessarily the same for all subjects. Thus n;; means the 
number of subjects who placed j in their highest class and who placed 
a total of three stimuli in that class. 

In estimating the number of subjects who would rate stimulus 
j as their first choice we make several plausible assumptions. First, 
we assume that those who rated j in their highest interval or category 
and who placed all other stimuli in lower classes would be expected 
to rate j as their first choice. Further, we assume that those who 
placed j and one other stimulus k in the highest interval would divide 
their votes for first choice between j and k with equal frequencies. 
Similarly, those who placed 7 in the highest interval together with a 
total of m stimuli in that interval would also divide evenly their se- 
lections for first choice among the m stimuli. This assumption is not 
quite correct, but it is very nearly so if the total number of categories 
is not too small. Theoretically one could differentiate among the m 
stimuli in the highest class by the shapes of the frequency distribu- 
tions, treating them as continuous frequency curves rather than as 
histograms, but such a correction is probably not worth the trouble 
in most practical work. 

These assumptions lead to an estimation formula which can be 
written in the form 


1 1 
Nj = Ny, + = + + — Nim (15) 
2 3 m. t 


where 
N;,= number of subjects who give stimulus j their first choice, 
t= largest number of stimuli which any subject put in his 
highest interval. é 


a 
. 
4 
j 
at 
] 
— 


L. L. THURSTONE 253 


Summing, we have 


t 
(16) 
mi 

which applies to each stimulus in turn. 

The experimental procedure contemplates the sorting by each sub- 
ject of a relatively large number of stimuli into a relatively small 
number of successive categories. This procedure is feasible, whereas 
it is not feasible to ask a subject to put, say, 50 or 100 stimuli in rank 
order. From the data for the sortings into a rather small number of 
categories, say seven, one can estimate the number of subjects who 
would give their first choice to any particular stimulus 7 when this 
stimulus is presented with any particular combination of other stim- 

\ uli in the collection. 

Several psychophysical methods and concepts have been described 
in previous papers by the author. Some of these papers are listed 
below. 


REFERENCES 
1. Thurstone, L. L. Equally often noticed differences. J. educ. Psychol, 1927, 
18, 289-93. 
a . A law of comparative judgment. Psychol. Rev., 1927, 34, 273-86. 
8. . Psychophysical analysis. Amer. J. Psychol, 1927, 38, 368-89. 
4 ——. A mental unit of measurement. Psychol. Rev., 1927, 34, 415-23. 
5. . Three psychophysical laws. Psychol.. Rev., 1927, 34, 424-32. 
6. . Experimental study of nationality preferences. J. gen Psychol. 1928, 
1. 405-25. 
"3 Fechner’s law and the method of equal appearing intervals. J. 
exp. Psychol., 1929, 12, 214-24. 
8. . The indifference function. J. soc. Psychol, 1981, 2, 139-67. 
9. . Rank order as a psychophysical method, J. exp. Psychol, 1931. 14, 
187-201. 
10. . Ability, motivation, and speed. Psychometrika, 1937, 2, 249-54. 


4 
. 
‘ 
4 
| 
ag 


PART 


PSYCHOMETRIKA—VOL, 10, NO. 4 
DECEMBER, 1945 


A BASIS FOR ANALYZING TEST-RETEST RELIABILITY* 


LOUIS GUTTMAN 


DEPARTMENT OF SOCIOLOGY AND ANTHROPOLOGY 
CORNELL UNIVERSITY 


Three sources of variation in experimental results for a test 
are distinguished: trials, persons, and items. Unreliability is de- 
fined only in terms of variation over trials. This definition leads to a 
more complete awalysis than does the conventional one; Spearman’s 
contention is verified that the conventional approach—which was for- 
mulated by Yule—introduces unnecessary hypotheses. It is empha- 
sized that at least two trials are necessary to estimate the reliability 
coefficient. This paper is devoted largely to developing lower bounds 
to the reliability coefficient that can be computed from but a single 
trial; these avoid the experimental difficulties of making two inde- 
pendent trials. Six different lower bounds are established, appropri- 
ate for different situations. Some of the bounds are easier to. com- 
pute than are conventional formulas, and all the bounds assume 
less than do conventional formulas. The terminology used is that of 
psychological and sociological testing, but the discussicn actually 
= a general analysis of the reliability of the sum of n vari- 
ables. 


CONTENTS 
I The Definition of Reliability 
1. Introduction 
2. Working Formulas for the Lower Bounds 
38. The Definition of Error 
4. The Variation Between Individuals and the Total Variation 
5. The Definition of the Reliability Coefficient 
6. A Comparison with the Conventional Formulation 
7. The Definition of Experimental Independence 
8. The Basic Assumptions 
9, The Assumptions of Experimental Independence 


10. Correlation Between Two Trials 


ParRT II Lower Bounds for the Reliability Coefficient 


11. The Item Variances and Covariances 

12. A Simple Lower Bound 

13. A Better Lower Bound 

14. An Intermediate Lower Bound 

15. “Split-Half’” Lower Bounds 

16. A Lower Bound Based on a Best Row of Covariances 
17. A Lower Bound Based on Linear Multiple Correlation 


*The writer is indebted to the members of his statistical seminar, to Profes- 
sor Mark Kac, and to Professor Samuel A. Stouffer and his staff in the Research 
Branch, Information and Education Division, War Department, for their helpful 
comments on this paper. 


255 


= 


256 PSYCHOMETRIKA 


ParT III Observability from a Single Trial 
18. Observability of Means from a Single Trial 
19. On the Bias of Trial Variances and Covariances 
20. Observability of Variances and Covariances 
21. Observability of the Lower Bounds 


Part I. The Definition of Reliability 

1. Introduction. It is now over forty years since Spearman first 
wrote about errors of observation (6). These errors, as he pointed 
out, have the remarkable property that they attenuate a correlation 
coefficient in a manner that cannot be remedied by increasing the 
number of individuals upon whom the correlation is based. Spear- 
man’s work has had great influence on much research in the psycho- 
logical and social sciences. The reliability of instruments like achieve- 
ment tests and attitude questionnaires has become a standard prob- 
lem for investigation. 

The analytical formulation of the problem that is conventional 
today* is that originally submitted by Yule in a letter to Spearman 
(7). It assumes that an observation consists of a “true score” plus 
an error, where the error is assumed to have a zero mean over indi- 
viduals and is assumed to correlate zero with the “true score” over 
individuals. Further assumptions concern zero covariances between 
errors, the covariances being taken over individuals. In publishing 
this formulation, which has become classical, Spearman remarks that 
he believes Yule makes more assumptions than are needed. 

The present paper is devoted to a reformulation of the analysis 
of reliability according to what seems to have been Spearman’s origi- 
nal purpose. It bears out Spearman’s contention that the conventional 
formulation, while simple in its algebra, encumbers the analysis with 
unnecessary hypotheses. Also, the present paper produces new for- 
mulas that not only assume less than do the conventional ones, but are 
simpler to compute in practice. 

The problem of reliability is of course not peculiar to psychology 
or sociology, but pervades all the sciences. In dealing with empirical 
data in any field, the question should be raised: if the experiment 
were to be repeated, how much variation would there be in the re- 
sults? One of Spearman’s important contributions has been to focus 
attention on the reliability of the swm of a number of variables. This 
is especially appropriate for psychological and sociological instru- 
ments like achievement and attitude tests which are scored by adding 
up item values. Our treatment stresses the reliability of a sum. 


* See, for example, (1), p. 411. 


j 
— 
— 
4 
| | 
— 
q 
— 


LOUIS GUTTMAN 257 


The formulation to be presented here differs from the conven- 
tional one in the following respects: 

(1) Error of observation is defined explicitly for each person 
on each item for each trial in a universe of trials. Thus the three 
sources of variation in an experiment are kept distinct: trials, per- 
sons, and items. Unreliability is defined as variation over trials. 

(2) Using this definition, no assumptions of zero means for 
errors or zero correlations are needed to prove that the total variance 
of the test is the sum of the error variance and the variance of ex- 
pected scores; this relationship between variances is an algebraic 
identity. Therefore, the reliability coefficient is defined without as- 
sumptions of independence as the complement of the ratio of error 
variance to total variance. 

(3) A major emphasis of this paper is that the reliability co- 
efficient cannot in general be estimated from but a single trial—that 
items do not replace trials. If two trials are experimentally indepen- 
dent, then we show that the correlation between two trials is, with 
probability of unity, equal to the reliability coefficient. 

(4) As is well known, there may be great practical difficulties 
in making two independent trials; therefore our principal focus is on 
what information can be obtained from a single trial. We find that 
lower bounds to the reliability coefficient can be computed from a 
single trial. Six different lower bounds are derived, appropriate for 
different situations. Several of these bounds are as easy as or easier 
to compute than are conventional formulas, and all of the bounds as- 
sume less than do conventional formulas. 

(5) To prove that bounds can be computed from a single trial, 
we use essentially one basic assumption: that the errors of observa- 
tion are independent between items and between persons over the uni-,. 
verse of trials. In the conventional approach, independence is taken | 
over persons rather than trials, and the problem of observability from 
a single trial is not explicitly analyzed. 

(6) We make no assumptions about the relationships between 
the items themselves, that is, as to what the relationships would be 
if there were no experimental error. 

(7) Proof that bounds can be computed from but a single trial 
(and that the reliability coefficient itself can be computed from two 
independent trials) turns out to involve the notion of convergence in 
the mean, so that the results hold with probability of unity. This kind 
of analysis is required by the problem of reliability. The algebra in 
the last part of this paper may be somewhat tedious, but this should 
not obscure the easy formulas that emerge for use in practice. 

The major practical results of this paper are the formulas for 


258 PSYCHOMETRIKA 


the lower bounds to the reliability coefficient, which can be computed 
from but a single trial. These bounds are listed in the next section, 
so that the reader who desires only working formulas can find them 
immediately. The formulas are for large samples; caution should be 
exercised in applying them to small samples. 

The succeeding sections of Part I are devoted to laying down 
basic definitions and notation for the mathematical analysis. In Part 
II, various lower bounds to the reliability coefficient are derived in 
terms of parameters defined over persons and trials. Part III proves 
that these bounds can be computed from but a single trial. 

2. Working Formulas for the Lower Bounds. A fundamental 
fact concerning unreliability is that, in general, it cannot be estimated 
from only a single trial. Two or more trials are needed to prove the 
existence of variation in the score of a person on an item, and to esti- 
mate the extent of such variation if there is any. 

The experimental difficulties in obtaining independent trials have 
led to many attempts to estimate the reliability of a test from only 
a single trial by bringing in various hypotheses. Such hypotheses 
usually do not afford a real solution, since ordinarily they cannot be 
verified without the aid of at least two independent trials, which is 
precisely what they are intended to avoid. 

An important result of this paper is to show that from a single 
trial, while it is not possible to estimate the reliability of a test, it is 
possible to set lower bounds to the reliability coefficient. In practice, 
such lower bounds will often be usefully greater than zero. It is as- 
sumed only that the items are experimentally independent, that the 
population of individuals is indefinitely large, and that the universe 
of (hypothetical) trials is indefinitely large. Since the working for- 
mulas are for large samples, they should be used with caution for 
small samples. 

The reliability coefficient for the test will be denoted by p,?. In 
Part II of this paper are developed six lower bounds to p;,? in terms of 
parameters defined over all persons and trials; these bounds are de- 
noted there by / with distinguishing subscripts. In Part III, it is 
proved that each 1 can be computed from but a single trial with prob- 
ability unity. The computation for a 4 from a single trial we shall de- 
note here by an L with the corresponding subscript. The L’s are the 
working formulas. 

For each of the L’s, it is true with probability unity that 


The computations for the L’s are as follows. 


2 


LOUIS GUTTMAN . 259 


For a given trial, let s,?, 8.7, «++ , 8,2 be the variances over per- 
sons of the n items in the test, and let s;7 be the variance over persons 
of the sum of the items. The simple lower bound 4, (see §12 below) 
can be computed from the formula 


n 
> 
j= 


Since L; below is a better bound (if L, > 0) and easily computed, L, 
will ordinarily not be used by itself in practice; but it is helpful in 
computing and comparing the various lower bounds. 

A definitely better lower bound than L, is L., which requires 
computing the sum of squares of the covariances between items for 
the given trial; this sum of squares is denoted by C,. The bound A, 
(§13 below) is computed from the single trial by 


L=L, +- 


C, is the sum of n(n—1) terms; but the covariances are equal in pairs 
since the covariance of item 7 with item k is equal to the covariance 
of item k with item j; therefore C. is simply twice the sum of the 
squares of the n(n—1) /2 different covariances. 

A; is derived in §14 by weakening A. in order to save the labor of 
computing covariances. It is better than /,, if the latter is positive, 
since it is computed from the formula 


n 
n—1 


This is so easy to compute that L, need never be used by itself as a 
lower bound. 
The relationship between these first three bounds is expressed 
by the inequalities: 
L,<L,;=l,, 


the first inequality assuming that L, > 0. 

A conventional formula that attempts to estimate the reliability 
coefficient by using a series of assumptions is the well known “cor- 
rected split-half coefficient.” Our lower bound 4, (§15) resembles 
this coefficient. Two things, however, must be remembered: (a) Ly, 
assumes nothing additional to the assumptions stated above in this> 
section in order to be used from a single trial, and (b) L, underesti- 


ta 
> 
| 
4 
q 
a 


260 PSYCHOMETRIKA 


mates the reliability coefficient. Furthermore, L, is easier to compute 
than is the “corrected split-half coefficient,” since no correlation co- 
efficient is explicitly computed. The formula for L, requires that the 
test be scored as two halves. The respective variances of the two 
parts for the single trial are denoted by s,? and s;?, and the formula is 


8.2 + 
L.=2(1- 


L, is a lower bound no matter how the test is split. It is desirable, of 
course, to try to split the test in such a manner as to maximize L,; 
L, will tend to be larger for halves which correlate more highly with 
each other. 

If s,? = s,? for a particular splitting of the test, then L, is nu- 
merically equal to the “corrected split-half coefficient” for that split, 
but it is still a lower bound to the reliability coefficient. This may 
help explain why, in the past, the “correction for attenuation” has 
sometimes yielded a “correlation” greater than unity; reliability has 
often been underestimated by the conventional formula, so that at- 
tenuation has been overcorrected. Many tests are more reliable than 
they have been considered to be; and many low correlations have not 
necessarily been due to unreliability. 

L, and L, are both easy to compute and will ordinarily be the 
most convenient of the lower bounds to use in practice. For the case 
of dichotomous scoring, where items have only the values zero and 
one (“wrong” and “right” in an achievement test), then L; is the 
easier of the two to compute, since then 


8? =p;(1—p)), 
where ; is the proportion of people having the value of unity on the 
jth item (the proportion getting the jth item “right”). However, 
often it will be easy to find a splitting of the test that will vield an 
L, that will be substantially higher than L, , so that it will be definite- 
ly preferable to compute L, in such cases. 


Two further lower bounds that will sometimes be of use are A; 
and 4,. For the single trial, let C.; be the sum of the squares of the 


covariances of item 7 with the remaining n—1 items, and let C. be the 
largest of the C.;. Then 4, ($16) can be computed from 


2VC, 
8,7 7 


L, will be greater than L. , and hence L, , for a test in which one item 
has large covariances with the other items compared with the covari- 


L;=L, + 


LOUIS GUTTMAN 261 


ances among those items. Otherwise, L; is less than or equal to L,. 

The sixth of our lower bounds is based on multiple correlation. 
For the single trial, let e;? be the variance of the errors of estimate 
of item 7 from its linear multiple regression on the remaining n—1 
items. Then A; (§17) can be computed from 


n 
» 
j=l 


L, will tend to be larger than L., and hence L; , when the items have 
relatively low zero-order intercorrelations but high multiple correla- 
tions. Otherwise, L, will tend to be less than or equal to L. . 

The probability is unity that the reliability coefficient p,? is not 
smaller than the largest of L,, Lz, L;, Ls, and L,. And no mat- 
ter what these lower bounds may be, they cannot disprove the hy- 
pothesis that p,2—=1. At least two trials are needed to disprove such 
a hypothesis. A single trial can set a minimum to the reliability co- 
efficient, but not a maximum less than unity. 

. 3. The Definition of Error. A proper analysis of reliability 

must begin with a precise definition of error. The errors with which 
we are concerned are defined by an indefinitely large wniverse of 
trials. They are defined separately for each individual in a popula- 
tion for each item (variable) being observed. 

Consider a set of n items. Let 2;;, be the observation of the ith 
individual on the jth item in the kth trial. The expected (mean) val- 
ue for this individual on the item over all trials will be denoted by 


= Exijx, (1) 


where E denotes mean value (mathematical expectation) over the in- 
dicated subscript. The variance of the ‘th individual on the jth item 
over all trials is 


Tis 


= — Xi;)?. (2) 
k 


This may be called the error variance of the ith individual on the jth 
item. 

The test score, or total score, of the ith person on the kth trial 
will be denoted by t, , and by definition, 


tin => (3) 
j=1 
The expected test score over all trials of the ith person will be denoted 
by 


T; = Etix. (4) 


8 : 
L,=1———. 
: 
n : 
q 
& 
j 


262 PSYCHOMETRIKA 


Taking expectations over trials of both members of (3), and using 
(1) and (4), yields 


T; = > Xij. (5) 
j=1 
The error variance on the test for the ith person is defined as 
e =E(tx—Ti)’. (6) 


4. The Variation Between Individuals and the Total Variation. 
Thus far we have defined the variation within a person over the uni- 
verse of trials. We shall also need to consider the variation between 
persons, which is done in terms of their expected scores over trials. 
The mean over persons of the expected test scores is 


(7) 


and the variance over persons of the expected test scores is 
or = E(T; — ur)?. (8) 


Finally, we need to define the total variation over all persons and 
trials. The general mean of the test over all persons and trials, u; , is 
of course equal to yr because 


Mt = EEt i, = KET, = 
ik i 
The total variance of the test over all trials and all people is 
= EE (ty. — (9) 
ik 
Now, 
E (tix— mw)? =E[ (tx — Ti) + (Ti — ur) 
k k 
= Ti)? + (Ti — ur)?. 
k 


Substituting the last member into (9), and remembering (6) and 
(8), we get the basic formula: 


= + (10) 


5. The Definition of the Reliability Coefficient. We shall define 
the reliability coefficient of the test for the population of individuals 
to be 


= 
4 


LOUIS GUTTMAN 263 


Eo* 
OT 
pe? =1— (11) 


_ This definition states in precise terms what seems to be Spearman’s 
original intention. That it is equivalent to the “correlation between 
two independent trials” will be seen in §10 below. 

Obviously 0 < p,;? < 1. ° The coefficient is zero only if all* ex- 
pected scores are equal. Hence, any test which has any variance at 
all in expected scores has some reliability. 

6. A Comparison with the Conventional Formulation. The term 
Eo* in (10) is equivalent to what has been called the “error variance” 


in previous formulations. Notice, however, the precise analytical 
structure of this term: it is the mean of the individual error variances. 
No assumption is made that individuals are equally unreliable. What- 
ever individual differences there may be in unreliability, the error 
term is the mean of the unreliability variances. 

The term o,? corresponds to what has been called the “true vari- 
ance” in previous formulations. Notice that it is not assumed that 


E (tix —Ti) =0, (a) 


that is, that the mean of the errors of a single trial is zero, as is done 
conventionally. Nor is it assumed that the errors correlate zero with 
the true scores on a single trial: 


— (T; — ur) =0, (b) 


as is done conventionally. Formula (10) actually involves no assump- 
tions except that the variances exist.+ Therefore, definition (11) de- 
fines the reliability coefficient for dependent trials as well as for in- 
dependent trials. 

Concern about independence of trials arises from the need to set 
bounds for p;,? on the basis of one or two trials. Parts II and III of 
‘this paper show how such bounds can be set, given independence be- 
tween items and between persons over trials. 

7. The Definition of Experimental Independence. The definition 
of experimental independence involves the notion of statistical inde- 
pendence. The general definition of the statistical independence of 
two variates, continuous or discrete, is well known, but it may be 


* More strictly, all except possibly for an infinitesimal proportion. 

+ That (a) and (b) are actually true, except possibly for an infinitesimal 
proportion of trials. can be proved using an assumption of independent trials, 
using the method of Part III. 


| 
4 
t 
i 
a 
4 
a 
3 
a 
a 
_ 
# 
: i 
i 


264 PSYCHOMETRIKA 


helpful to review it here. Let y be a variate with frequency function 
f,(y) ; let z be a variate with frequency function f.(z) ; let f(y,z) be 
the joint frequency function of y and z; let g(y|z) be the conditional 
distribution of y for fixed z; and let h(z|y) be the conditional dis- 
tribution of z for fixed y. Then y is said to be statistically indepen- 
dent of z if 


g(y\z) =f, (y) (c) 
for each z. Similarly, z is statistically independent of y if 
h(z|y) = f(z) (d) 


for each y. It is well known that if (c) is true, then (d) is true, and 
conversely ; and that a necessary and sufficient condition for (c) and 
(d) is that 

f(y,z) =f, (y)fe(z). (e) 


As a consequerce of (e), it follows that, for any powers p and q for 
which the indicated moments exist, 


= (E2‘), (f) 


where the expectations are taken over the appropriate universes. 

In the present paper, we find it convenient to consider a distribu- 
tion of observations as an unarranged sequence of numbers, so that 
we need to make no explicit statements in terms of frequency func- 
tions. At most we shall require conditions analogous to (f), where 
the highest power is four. That is, instead of requiring complete in- 
dependence, we require at most only such independence as is defined 
by the first four moments. For brevity, however, we shall at times 
speak of “independence” without qualification, even though it is more 
than we need. 

By experimental independence we mean statistical independence 
in a universe of trials. As an example, let yj, and z:x be the values of 
y and z of the ith person on the kth trial. If y and z are experimen- 
tally independent for the ith person, then it must be true for such 
p and q for which the moments exist that 


Ey = (Eyix’) (E24) . (g) 
k k k 


Another way of stating (g) which is more convenient is: 


E (yin? — Ey’) (2x! — =0. (g’) 
k k k 


In particular, when p = q = 1, (g’) states that the covariance over 
trials between y and z is zero for the ith person. 


| | 


LOUIS GUTTMAN 265 


Notice that (g) expresses an independence condition for each 
person separately. For brevity, we shall state that two variates are 
experimentally independent without inserting a qualification, thereby 
implying that this is true for each individual. 

It should be clear that the definition of experimental indepen- 
dence does not involve the relation between y and z considered over 
the population of people. For example, (g) does not involve 

4 


The extension of the notion of experimental independence to 
more than two variables follows easily. An explicit statement of 
what is needed for this paper is given in assumption (C), in the next 
section. 

8. The Basic Assumptions. There are only three kinds of as- 
sumptions employed in this paper: 

Assumption (A). The following moments exist: 


EES; 
ik 


where expectations are taken first over k and then over 7. 
Assumption (B). The population of individuals and the universe 
of trials are indefinitely large. 
Assumption (C). The items are experimentally independent to 
the extent defined by the equations 


hyjk 


h, he #%; if g then h, h, ort,, andi, #% p,q, 
9,j7=1,2,---,n. 

Let us examine how stringent these assumptions are. 

From the theory of moments, it is well known that if a distribu- 
tion has a finite range, then all its moments exist. Since test items 
ordinarily permit only finite scores, assumption (A) is in practice 
almost invariably fulfilled. If assumption (A) is fulfilled, it follows 
that the moments 


EEt;? (p=1,2,3,4) 
tk 


4 
q 
i 
| 
y 
| 
= 


266 PSYCHOMETRIKA 


exist, since t;, is the sum of a finite number of items.* 
Thus far in this paper we have assumed only that EEt, and 


ik 
EE? exist, which is enough to establish basic equation (10), from. 
ik 


which definition (11) was formed. 

Assumption (B) is not needed until Part III, where it serves as 
part of the sufficient conditions for lower bounds to p,? to be observ- 
able from a single trial. That the universe of trials be indefinitely 
large seems part of the definition of the problem of test-retest reli- 
ability, rather than an empirical assumption; errors of observation 
seem always to be regarded as a sample from some hypothetical uni- 
verse of indefinitely many trials. That the population of individuals 
be indefinitely large, on the other hand, is more of an empirical re- 
striction. There are cases where it may be desired to speak of the 
reliability of a test for a given, relatively small, group of people. The 
biases in setting a lower bound from only a single trial in such a case 
are shown in the results of Part III. These biases, however, disap- 
pear for indefinitely large populations. 

While explicit use is made in Part III of the assumption that the 
population of individuals is indefinitely large, no explicit statement 
is made of the corresponding assumption for the universe of trials. 
In fact, nowhere in any derivations of this paper is it explicitly stat- 
ed that the trials are indefinitely numerous. We therefore point out 
here that it is implicit in the derivations that if there are N persons 
in the population, then there must be at least N + 1 trials in the 
universe if two items are to be experimentally independent. This is 
because assumption (C) imposes at least N + 1 linear restrictions 
on the z;;,, so that if the population is indefinitely large, assumption 
(C) cannot hold unless the universe of trials is also indefinitely large. 

9. The Assumptions of Experimental Independence. Assump- 
tion (C) states essentially two things: 

(C,) The observed value of an individual on an item is expcri- 
mentally independent of his values on any other items. 

(C.) The observed value of an individual on an item is experi- 
mentally independent of the observed values of any other individual 
on that or any other item. 

Assumption (C.) is used in Part III of this paper. Experimental 


*It becomes another problem to consider limiting cases as n — ©, since lim- 
its of the moments of t;, need not exist. Consideration of such limiting cases is 


required if inferences are sought concerning a universe of items of which the n 
are a sample. Ordinarily, this can be done by taking expectations over the uni- 
verse of items, rather than sums, but for this the universe of items should have 
some specified structure. A scale (2) is a simple example of a structured uni- 
verse of items. 


: 


LOUIS GUTTMAN 267 


conditions can usually be established to fulfill this assumption. For 
example, if the set of items is an objective examination or an attitude 
questionnaire, and if the individuals cannot copy from each other and 
can have no indication of how the others are answering, their re- 
sponses may ordinarily be considered experimentally independent. 

' Assumption (C,) is used more immediately, in Part II, and may 
afford more difficulty in being realized in practice. What it calls for 
is that the deviations from trial to trial of the values of an individual 
on one item shall not depend on the deviations on the other items. In 
an examination or questionnaire of any substantial length, this con- 
dition can often be approached by an appropriate arrangement of the 
items into such a sequence as to minimize carry-over from item to 
item within a triai. 

It should be remarked that nowhere is it assumed that items are 
“equivalent,” or are “measuring the same thing.” The items may be 
a battery of heterogeneous predictors, assembled to predict some one 
criterion; they may be a sample from a scalable universe, and thus 
be essentially functions of a single variable (2) ; or they may be any- 
thing else. No assumptions are needed beyond (A), (B), and (C) 
to make a practical investigation of reliability. Further assumptions 
like those used conventionally are unnecessary, and are indeed im- 
practical since ordinarily it is difficult to tell when—if at all— they are 
fulfilled. 

10. Correlation Between Two Trials. Before proceeding to our 
new results, it may be of interest to see what a classical result looks 
like in our present terms. We shall show that p,’? is the same as the 
correlation between two independent trials, when such a correlation 
is properly defined. In order to speak of two trials as experimentally 
independent, we must regard them as one pair out of a universe of 
pairs of trials. 

Let t,:, and toi, be the test scores of the ith person on the kth 
pair of trials. We have, by definition, 


= = =T; 
BEty = = = = =m (12) 
BE (tu — BE (ta — Mt)? = 
The covariance anon pairs over all individuals and pairs of trials is 


and the correlation between pairs over all individuals and pairs of 
trials is clearly 


4 
q 
4 
q 
q 
q 
t 
a 
q : 


268 PSYCHOMETRIKA 


=—. (14) 


ot 


tits 


Now, we assume that the trials are experimentally independent 
at least to the extent that covariances over trials vanish for each per- 
son: 

=0, (t=1,2,---). (15) 


Rewriting (13) as 


+ (Ti — pr) (tix — Ti) + (Ti — ur) ], 
expanding, using (15), (12), and (8), we obtain 
(17) 
From (17), (14), and (11), 
(18) 


tite 


It should be carefully noted that (14) does not define the corre- 
lation over individuals between test scores on two trials; it defines 
the correlation over individuals and over a universe of pairs cf trials. 
The correlation over individuals for a pair of trials is defined by 

(trix Etiix) (toix Etzix) 
(19) 


LE (tix — (ton — 


It can be shown, using assumptions (A) and (B), and restating (C) 
to refer to t,;, and t.;,, that except possibly for an infinitesimal pro- 
portion of the time, Pius is equal to Pr? and hence to p,;?. This can 
be seen by analogy to the results of Part ITI. 

Therefore, if it is possible to make two independent trials of a test 
in practice, on a large population, then by virtue of Part ITI, the cor- 
relation between the two trials may be taken as equal to the reli- 
ability coefficient. 


Part II. Lower Bounds for the Reliability Coefficient 
11. The Item Variances and Covariances. Because of the em- 
pirical difficulties of making two independent trials, it becomes im- 


tate 


LOUIS GUTTMAN 269 


portant to inquire into what can be learned about reliability from 
only a single trial. It is to this problem that the remainder of this 
paper is devoted. A single trial can yield information about the re- 
liability of the whole test if at least two experimentally independent 
items are involved. In such a case, lower bounds to the reliability co- 
efficient can be computed from item variances and | covariances ob- 
served on a single trial. 

To establish a lower bound, we first derive it in terms of para- 
meters defined over all individuals and trials. It is then shown in 
Part III that the required parameters are observable from but a 
single trial. 

Let the total mean of the jth item be u;. Then 


= = EXi; (j=1,2,---,n). (20) 
ik 


The variance of the expected scores and the total variance, respec- 
tively, of the jth item are 


of BE (xin — Bj)? (J=1,2,---,m). (22) 
ik 


The covariance between expected scores and the total covariance, re- 
spectively, for items g and j are 


= E(Xig— by) — (9,j=1,2,---,m) (28) 


if = EE (Xigk — — » (9, (24) 
ik 


As usual, the covariance of a variable with itself is its variance: 


= 


Rewriting (24) in the form 
4 + (Xig— oo) — + — 


expanding, using (2) and assumption* (C), we obtain a basic for- 


mula: 
(25) 


where 6,; is Kronecker’s delta: 


catt. 


* Actually only (C,), and even then only with regard to covariances. 


4 
d 
| 
q 
i 
. 
; 
q 


270 PSYCHOMETRIKA 


The two kinds of statements in (25) are explicitly: 


+Eo’ . (27) 


In Part III, it is shown that the a including the o , are ob- 


servable from a single trial. Item covariances from a single trial are 
defined by 

E (Ligk — Exigx) — Ex: jx) 
In §20, it is shown that y,;, = Psy except possibly for an infinitesi- 
mal proportion of the trials. Therefore, from (26) the Rice. are also 
observable when g #7. But the o. ,are in general not observable, be- 
cause error variances E o , are not observable. Similarly, o;? is ob- 


servable, but o,? is not. 
The lower bounds developed in the following sections are entirely 
in terms of observable quantities. 


12. A Simple Lower Bound. Subtracting corresponding mem- 
bers of (5) from (3), we have 


tie — Ti => (Xin, — Xij). 
Squaring both members, taking expectations over k, and using (6), 
(2), and assumption (C,), yields 


=e. (28) 


This states that the test error variance of a person is the sum of his 
item error variances. Taking expectations over i in (28) yields 


Eo . (29) 
i 


This states that the total error variance of the test is the sum of the 
item error variances. 


Summing both members of (27) over 7 and using (29) yields an- 
other basic formula: 


Sot (30) 


n 
2 
t; 
j=1 
n 


LOUIS GUTTMAN 271 


The left member of (30) is observable from a single trial, so we 
immediately have a useful and simple inequality: 


j=1 
Using this in (11), we obtain a simple lower bound to the reliability 
coefficient that is observable on a single trial. Let 


(32) 
Ct 
Then 
A Spi si. (33) 


The equality on the left holds if and only if o Res 0,@g=1,2, 
, in which case a7? = = 0. 
From (3), (9), and (24), it readily follows that 


=3 


g=1 j=1 


Then we can rewrite the lower bound as 


o=1 j=1 Vogt; jaa vj 
4 — (34) 
Ct 
2a? 
g=1 j=l 


The right member shows that the numerator is the sum of all the ob- 
served item covariances, omitting the variances, while the denomina- 
tor is the sum of all the variances and covariances. Let 


Then 4, will be positive, zero, or negative according as I’, is positive, 
zero, or negative. If j, is zero or negative, then it affords no informa- 
tion about p,’, for p;? is always nonnegative. 


13. A Better Lower Bound. A better lower bound than 4, will 
now be derived. Since the square of a correlation coefficient cannot 
exceed unity, 


Soe’ (9,j=1,2,-:-,%). (36) 


Yr, 
Xg Xj 


n 
2 
Ct 
non n q 
. 
j=l 
| 
| 
{ 
q 
q 
q 


272 PSYCHOMETRIKA 


Then, from (26) and (36), 
2% o (9,7=1,2,---,m). (37) 


Summing both members over g , but subtracting out the case g=j , we 
obtain 


Sy. —e (201, (88) 


x3 
Summing over all values of j , we obtain 
n 2 n 
(39) 
where 
40 


I; is the sum of the squares of the covariances, omitting the variances. 
The variance of the o. , over the n items is 


n n 


and 


n 1 n 2 
a= 2 os 
2%, no +3(24) 


j=1 


Substituting the right member into (39), we obtain 


so that clearly 


or 


(42) 


This last inequality is weaker the more variation there is among the 
‘. ; in particular the equality cannot hold if a? > 0. 


z 


From (30) and (42), 


n n 2 
2 
|| 
na 
n 
= 


LOUIS GUTTMAN 273 


jr n—l1 
Let 
on — 
(44) 


The right member is observable from a single trial. From (11) and 
(43), 4. is a lower bound to the reliability coefficient: 


z2=<= 
= Pt = 
The equality on the left holds if and only if the variances and covari- 


ances of expected scores are all equal in absolute value. 
That 4, is a better bound than /, can be seen from the fact that 


n 
Ps 
(45) 
ot 
Another way of writing A, is 
A 


= . (46) 


The numerator involves the sum of the covariances and the sum of the 
squares of the covariances, omitting the variances and their squares. 


14. An Intermediate Lower Bound. By weakening j., we can 
obtain a lower bound that will be simpler to compute. Let £? be the 
variance of the n(m—1) covariances between items; thus, 


r, 2 
Therefore 
r,=n(n— 1) #? 
n(n—1) 
or 


(48) 


Ct 
| 

2= 
n(n—1) 


274 PSYCHOMETRIKA 


| The equality holds if and only if 6? = 0, or all inter-item covariances 


are equal. 
From (48), 


(49) 
a—1 ° 


Using this result in (45), and remembering (34) and (35), we obtain 
1 
dp ZA, + —— | A, |. (50) 


Therefore the right member is also a lower bound to p;? and is inter- 
mediate between 4, and 4.. But if 4, is negative, then the right mem- 
ber of (50) becomes 


A, 


which is also negative; and like 4,, it yields no information about 
pe?. There is a gain in information only if 4, is positive. Hence, if we 
let the new lower bound be 


(51) 


we have lost no information by discarding the absolute value sign. 
Both 4; and 4, are useful if and only if 1, is positive. 


We thus have 
4=——\1-=— (52) 


n—1 Ct 


and 
As = pr? = 1 . 


The equality on the left holds if and only if the variances and covari- 
ances of the expected scores on the items are all equal, because A; 
equals only if 6? = 0. (For to equal , it is necessary and suf- 
ficient that the Baa be equal only in absolute value.) 

A; is easier to compute than /, , since only the total variance and 
the item variances are required.* If the covariances are all positive 


*}, resembles a formula developed separately by Kuder and Richardson (5) 
and Hoyt (4). In fact, L., of §2 above is algebraically identical to this formula 


(which is formula (20) in Kuder and Richardson’s paper). This seems to be only 
a coincidence. The derivation of \, has little in common with the derivation of 


n a 
j 
: is 
: 


LOUIS GUTTMAN 275 


and homogeneous, then 4; will not be much less than 4, and may be 
an adequate lower bound. If the covariances are heterogeneous, and 
in particular, if some are negative, then 4, will be definitely superior 
to A;. A, can be positive and useful when A; is negative and useless. 


15. “Split-Half” Lower Bounds. A traditional approach to es- 
timating p;? has been to divide a test into two parts, assume the parts 
to be equivalent and experimentally independent, correlate. the two 
parts, and “correct” the correlation for test length (1, p. 419). It 
is fortunate that we can now dispense with assumptions of equiva- 
lence, yet retain essentially the same computations, and arrive at a 
rigorous answer. 

If a test is divided into two parts, each part may be thought of 
as an item, so that the test may be thought of as composed of two 
items. Let o,? and oa,” be the respective variances of the two parts, 
taken over all persons and trials. If the two parts are experimentally 
independent, then according to (52), with n=2, we have a “split 
half” lower bound to p,?. If we let 


24 
(53) 


ot 


then 


Let us compare 4, with the traditional “corrected split-half co- 
efficient.” As is readily seen, 


Kuder and Richardson’s formula, and has quite a different interpretation. The 
parameters of A, are defined over trials and persons; i, is a lower bound to p,?; 
and the proof that }, can be computed from but a single trial by L, requires no 
assumption beyond (C). Kuder and Richardson’s approach makes no treatment 
of a universe of trials; it attempts to estimate rather than to bound the reliabil- 
ity coefficient; it introduces assumptions about the relationships between items; 
and observability from but a single trial is not explicitly discussed. Hoyt’s analy- 
sis does seem to consider a universe of trials, albeit observability from a single 
trial is not explicitly examined; however, he introduces very stringent hypotheses 
about the relationships between items (which in a sense resemble Kuder and 
Richardson’s) in terms of a linear hypothesis of analysis of variance; he at- 
tempts to estimate rather then to bound the reliability coefficient. The stringency 
of this linear hypothesis can be seen from the fact that, in our notation, it re- 
= the equality in (86) to hold for all pairs of items and that a = B = 0. 
ut these conditions are rarely, if ever, found in practice. The hypothesis that 
B =0 can be tested from a single trial. If the items are part of an approximate 
scale (2), it can be seen that 8 ~ 0; and if the items are a heterogeneous set of 
predictors for a particular criterion, then ordinarily 8 ~ 0. The hypotheses that 
the equality in (36) holds and that a = 0 cannot be tested from but a single trial, 

which makes it desirable to avoid them, especially in view of their strictness. 
A, is simple and should be very useful, even though it is weaker than ),. It 


is indeed fortunate that L, needs no assumption beyond assumption (C) to be 
used as a lower bound to the reliability coefficient in practice. 


% 
= 


276 PSYCHOMETRIKA 


= = 2 G1 O2 (54) 


where p:2 is the correlation between the two parts over all persons 
and trials. If o,2 = o,*, then from (54) and (53), we obtain 


(53') 
1 P12 
which resembles the “corrected split-half” formula. 

But 2’, is still a lower bound to p,?. It does not assume equivalent 
halves—it assumes only experimentally independent halves with equal 
variances. If the two halves are not equivalent, (but have equal vari- 
ances) then i’, definitely underestimates p,’. 

This has a very important bearing on all previous empirical re- 
search which has used “corrected split-half coefficients.” It seems 
plausible that in a vast number of cases the hypothesis of equiva- 
lence is dubious, and that therefore the reliability of a great many 
tests has been seriously underestimated. 

If reliability is underestimated, attenuation is overestimated. 
Therefore, “corrections for attenuation” used in past research (1, p. 
367) must be regarded with caution, for in many cases they are falla- 
ciously high. This helps account for “corrections” that yield correla- 
tions greater than unity in practice. 

In practice, there is no need to use even the hypothesis o,? = o.?. 
Formula (53) is preferable to (53’) because it needs no such assump- 
tion. 

If a test contains many items, there are many ways of splitting 
it into two parts, and each way may yield a different 4,. The largest 
of these /, is, of course, the best bound for p,? computed in this man- 
ner. In practice, a sufficiently high 4, can often be found by a careful 
splitting of the test, so that there would be no need to seek the best A, . 

If a test can be split into two parts that correlate relatively high- 
ly with each other, this will ordinarily yield a 4, that is better than 


As or even A,. 
16. A Lower Bound Based on a Best Row of Covariances. A 


lower bound which may in some cases be better than A, can be estab- 
lished as follows. From (38), we see that 


1 
= + (j=1,2,---,m), (55) 
gu “x, 
where 
= gt (j=1,2,---,”). (56) 


ont 


LOUIS GUTTMAN 277 


By differentiation, the minimum of the right member of (55) as a 
function of o ’ is found to be attained when o is equal to VI2;. 
Using this minimizing value, we obtain 


2VT3;; (g=1,2,---,m). (57) 
1 
Let 
Dot 
45, =1—7 (58) 


From (30), (11), and (58) we see that we have n lower bounds to 


pt?: 
dsj Sp? (j=1,2,---,m). (59) 


Let A; be the largest of the j;;. Then A; is the best lower bound based 
on the covariances with a single item in this fashion. Explicitly, 


Q=1 
or 
where I’, is the largest of the I’; . 
That A; is in general better than J, follows from the fact that 


As=a + 


ot 


As will be greater than /, in those cases where 


= n 
F., 
n—1 


which requires that n > 2 and that one item have large absolute co- 
variances with the other items compared with the covariances among 
those items. Otherwise, 4; . 

17. A Lower Bound Based on Linear Multiple Correlation. All 
the preceding lower bounds will often fail to approach unity in prac- 
tice, even though p;? = 1. In this section, we shall see that a high 
lower bound can be computed for the case where each item has a high 
linear multiple correlation on the remaining items. 

Let fj, (7,9 =1,2,---,m) be any set of n? constants, and let 


Vin = Big (Xion — fy) - (61) 
g=1 


Ot 
n 
= 


278 PSYCHOMETRIKA 


Thus, y;;, is an arbitrary linear combination of the xi,, deviates. Let 
= —Yin)?, (G=1,2,---, 2) (62) 
ik 


and let 
n°? = —yinx)?, (63) 
ik 


Upon expansion, 
+e, 
vj 
and, using (25), 
Therefore, 
, 


or, using (27), 


For the case £;; = 0, we obtain the inequalities 
, (j7=1,2,---,”). (64) 


The left member is 2 minimum when the §;, are the regression co- 
efficients of the x;,, in the regression of x;;, on the remaining »—1 
items, over the universe of trials and the population of individuals; 
and the minimum ¢;? is the variance of the errors of estimate from 
this regression. Hence the 


Theorem: The unreliability error variance of an item is not great- 
er than the linear regression error variance of that item from the 
n—1 remaining items.* 

As a corollary, if an item has a perfect multiple correlation on 
the remaining items, it is perfectly reliable. 


Let €;? be the multiple regression error variance of the jth item, 
and let 


(65) 


* The proof of this theorem is adaptable to afford another proof for a similar 
theorem for factor analysis, established elsewhere (3), to the effect that the 
square of the multiple correlation coefficient is a lower bound to the communaiity. 


} 
| 
2 
j=l 
| 
+ 


LOUIS GUTTMAN 279 


From (64), (65), (29), and (11), we see that 4, is a lower bound to 
the reliability coefficient: 


As = pt” = 


Crudely speaking, 4, will be larger than i. when the items have 
relatively low zero-order correlations and high multiple correlations; 
and 4, will be smaller than 4, when the zero-order correlations among 
the items are relatively high compared with the multiple correlations. 

If the set of items is part of a scale (2), then as m increases the 
multiple correlation of any item on the remaining items will ordi- 
narily approach unity. Therefore, given the experimental indepen- 
dence of the items, a sample scale with an appreciable number of items 
is ordinarily highly reliable. 


ParT III. Observability from a Single Trial 


18. Observability of Means from a Single Trial. A crucial, and 
hitherto neglected, problem of test-retest reliability is: what para- 
meters are observable in a single trial? In particular, to use the low- 
er bounds established in this paper, we must show that they are ob- 
servable. Since only means and covariances are required for these 
bounds, we shall restrict ourselves to showing that first and second 
moments are observable. 

The proof involves the notion of convergence in the mean; spe- 
cifically, we use the fact that if the variance of a variate is zero, the 
variate vanishes except possibly for an infinitesimal proportion of the 
observations. To bring out the details of the proof and thus obtain 
general results for a finite population, we shall begin with a finite 
population of N individuals. Then the operation E is the same as the 


‘ 
operation — > , where the expectation is over individuals. 
i=1 


The mean of the jth item on a single trial is Ex;;,. The variance 
of this mean over all trials is 
E jx. — uj)? = ELE jx — Xij) 
i k i 


k 


1 N 
=— SE — (Zine — 
int 
From assumption (C), the last member becomes 


1 


280 PSYCHOMETRIKA 


Therefore 


+ 


and, since the error variances are all bounded according to assump- 
tion (A), 
lim (Exijx— uj)? =0. ( 67) 


Now k i 


Therefore, for an infinite population, in all, except for possibly an 
infinitesimal proportion of trials, we have 


Exijn= uj, (7=1,2,---,”), (68) 
which was to be shown. 


Equation (66) shows the important fact that if N is finite and if 
E > 0, then (68) does not hold for a substantial proportion of 


trials. This implies that in general, for finite N , the mean of the er- 
rors of observation will not be zero on many trials. 


19. On the Bias of Trial Variances and Covariances. The co- 
variances between items g and j on a single trial is 


Voik E — EXigx) (ijn — (69) 


i i 


In particular, of course, y;;; is the variance of the jth item in the kth 
trial. Using a familiar device of least squares, we write 


— Xig) + (Xig — — (Xion — Xig) — 
+ — E(an—Xu)1, 


which upon expansion yields 
Poi k = E — Xig) (Lin — Xij) + (Xig — Mg) (Hin — Xis) 
(70) 
+ — — Xig) + y, EE — Xng) — Xij) 
i hi 
Taking expectations over k and using assumption (C), we obtain 


1 
k 


The right member of (71) shows that covariances between items 
on a single trial are unbiased estimates of the hat whereas trial 


5, 
= 
+ 
‘ 
: 


LOUIS GUTTMAN 281 


variances are biased as long as N is finite and there is unreliability. 
An important fact is that for finite N , the bias in variances cannot 
be estimated from a single trial, for the o* cannot be estimated from 
a single trial. For very large N, of course, the bias must be negli- 
gible. 


20. Observability of Variances and Covariances. To show that 
universe variances and covariances are observable from a single trial 
involves much more detail than for the case of the mean in §18, but 
the proof is similar. We shall show that when the population is in- 
finite, 7; equals Vaiss except possibly in an infinitesimal proportion 
of trials. 

From (70) and (25), 
= E (Xigk — Xig) — Xiz) + (Xig — oy) (ijn — 


(72) 
+ E(Xij — — Xig) — oF — EE — Xing) (tise — 
4 hi 


Vag, 


Squaring both members, taking expectations over k, and using 
assumption (C), we obtain, after some tedious algebra, 


Y9i, N N , % % 


+E(Xig— + B( Xi; — Ba 
(73) 

4( 1) 


N 


2 
B(X, + +5 Bet, 


where 
(tin — (p=38,4). 
k 
Therefore, since all the moments on the right of (73) are bound- 
ed according to assumption (A), 


lim E (75,5 — (74) 
k 


Hence, for an infinite population of individuals, 


5, 
= g 
. 
i 
N WN 
ia 


282 PSYCHOMETRIKA 


VYoik—Y 
except possibly for an infinitesimal proportion of times. 
The structure of the right member of (73) indicates the caution 
to be observed with respect to relatively small N . 


21. Observability of the Lower Bounds. We have thus far shown 
how, for an infinite population, means and covariances are observ- 
able. But each lower bound involves a ratio of a combination of co- 
variances to o,?. To discuss what happens to ratios for finite popu- 
lations is rather complicated. The case of an infinite population is 
readily handled, however. As is to be expected, any of the lower 
bounds can be computed from a single trial provided the population 
is infinite. (Similarly, the reliability coefficient can be computed from 
two independent trials if the population is infinite.) 

The proof follows immediately from combinatorial probability 
considerations. If each variate in a set has probability zero of not 
being constant, then the probability that at least one is not constant 
cannot be greater than zero. Hence the probability is unity that all 
are constant, and that any function of them is constant. 

Therefore, for an infinite population, any function of the y,;x, 
for fixed k, will have probability of unity of being equal to the same 
function of the Pasi Thus, any of the lower bounds can be computed 


from a single trial by substituting the observed variances and co- 
variances of the trial for the total variances and covariances. 

In conclusion, it should be emphasized that this paper has not 
concerned itself with the problem of dealing with only a sample from 
the population of individuals. Our results refer to a trial of all of a 
large population. Sampling problems involve far more detail, and the 
intricacies of the sampling distributions of ratios loom large in the 
picture. 


REFERENCES 


Guilford, J. P. Psychometric methods. New York: McGraw-Hill, 1936. 

Guttman, Louis. A basis for scaling qualitative data. American Sociological 

Review, 1944, 9, 139-150. 

Guttman, Louis. Multiple rectilinear prediction and the resolution into com- 

ponents. Psychometrika, 1940, 5, 75-99. 

4. Hoyt, Cyril. Test reliability estimated by analysis of variance. Psycho- 
metrika, 1941, 6, 158-160. 

5. Kuder, G. F., and Richardson, M. W. The theory of the estimation of test re- 
liability. Psychometrika, 1987, 2, 151-160. 

6. Spearman, Charles. The proof and measurement of association between two 
things. American Journal of Psychology, 1904, 15, 72-101. 

7. Spearman, Charles. Correlation calculated from faulty data. British Journal 

of Psychology, 1910, 3, 271-295. , 


~ 


| 


PSYCHOMETRIKA—VOL, 10, NO. 4 
DECEMBER, 1945 


A SIMPLE ORTHOGONAL MULTIPLE FACTOR 
APPROXIMATION PROCEDURE 


HILDING B. CARLSON 
OCCIDENTAL COLLEGE 


This paper describes a simple orthogonal multiple factor ap- 
proximation procedure that involves no inversion of the sign of neg- 
ative residuals, the estimation of only as many communalities as 
there are factors, and none or only a few minor rotations of the 
axes in an attempt to obtain a “meaningful” solution. It also sug- 
gests a technique for the estimation of those communalities that 
must be estimated. The factor loadings obtained by means of this 
procedure, which we shall designate as the pre-selection* procedure, 
are affected by the order in which the factors are obtained, showing 
a reduction in variance accounted for by each successive factor, as 
is characteristic of the centroid, bi-factor, and principal factor so- 
lutions. The entire procedure takes considerably less time than that 
involved in the orthodox centroid method alone. 


In an article on another simple procedure for approximate fac- 
tor analysis, Woodrow and Wilson (5) point out that “the grounds 
for postulating a factor are merely that the tests show a certain clus- 
tering as regards their inter-correlations, that is, that the tests fall 
into groups so that the intra-group correlations are relatively high 
as compared with the inter-group correlations.” The pre-selection 
procedure goes further in this direction, in that it assumes that the 
highest correlation in a correlational or residual matrix is most likely 
to involve the test in which a factor has its largest loadings and that 
any succeeding factor can have no loading in this test. Inspection of 
published factor analyses shows that such conditions can be observed 
repeatedly where the centroid method, followed by rotation to or- 
thogonal simple structure, has been used. The pre-selection proce- 
dure involves pivoting on one of the two tests showing the highest 
correlation in the correlational matrix to be factored and obtaining 
the first-factor loadings from the correlations of this one test with 
the remaining tests. The amount of correlation due to the first fac- 
tor thus obtained is subtracted from the correlational matrix in the 
usual manner. The second-factor loadings are obtained from the resi- 
dual matrix, pivoting on one of the two tests which show the highest 
residual correlation. The correlations due to this second factor are 
then subtracted from the first residual matrix in the usual manner, 


*This name was suggested by Dr. E. L. Welker, Department of Mathe- 
283 


matics, University of Illinois. 


i 
j 
| 
44 
it 


284 PSYCHOMETRIKA 


getting a second matrix of residuals. This procedure is repeated un- 
til all residuals are zero, or the analysis is considered completed. 


The Estimation of Communality 

When communalities are not known, as is usually the case, the 
customary procedure in Thurstone’s centroid method (3) is to insert 
the largest correlation in each row or column in the empty diagonal 
cell in the correlational or residual matrix. In the case of small corre- 
lational matrices, closer approximations to the correct communality 
may be obtained by following the procedure suggested by Guilford 
(1) of substituting the calculated communalities obtained by a first 
factor analysis in the empty diagonal cell and then re-factoring. 
After three or four successive substitutions of communalities are 
thus obtained, both communalities and factor loadings become stable 
and the analysis is considered satisfactory. 

In some instances the proper communality can be immediately 
obtained before any factorial analysis is made. For purpose of dem- 
onstration, let us assume for the moment that we know only the cor- 
relation coefficients, neither factor loadings nor communalities, as in 
the case of the coefficients for Data A given in Table 1. The diagonal 
entries for tests a, b, c, d, and e are not known and must be estimat- 
ed. In this instance, they may be estimated correctly by the follow- 
ing procedure. 

On a piece of graph paper, erect an ordinate, with a scale of 
values ranging from .00 to 1.00, and indicate thereon the coefficients 
for any test, say a, with each of the remaining tests. On another 
ordinate, separated from that for ordinate a by an arbitrarily select- 
ed unit of one, do the same for any other test, say b. Draw lines 
through the point ac, representing 7,., and the point bc, represent- 
ing 7,-. Do likewise for ad and bd, and for ae and be. This is shown 
in Figure 1. If these lines are extrapolated they reach the abscissa 
at a common point. By the use of the common point on the base line 
it is possible to obtain the communality for tests a and b directly from 
the graph. The communality for test a can be obtained by drawing 
a line from this common point on the abscissa through the point ab 
on the b ordinate, and extrapolating this line until it crosses the a 
ordinate. The value for the communality for test a is then read di- 
rectly from the intersection just made and in this instance is equal 
to 1.00. The value for the communality for test b is obtained by draw- 
ing a line from the point ab on the a ordinate to the common point on 
the base line. This line crosses the b ordinate at .72 and this is the 
communality for test b. By the construction of similar graphs in- 

volving the remaining tests all the communalities can be obtained. 


— bt ct 


ctr Mm 


— 


= 


T 

d, 

ol 

2 

te 

n 

| d 

is 

t 

] 

i 


HILDING B. CARLSON 285 


The resulting communalities are found to be a, 1.00; b, .72; c, .30; 
d, .12; and e, .02. The communalities for test a, could, of course, be 
obtained from any paired column of correlation coefficients, as a with 
c, a with d, or a with e. Likewise the communality for each of the 
other tests can be obtained. 

It can now be revealed that the correlation matrix in Table 1 was 
obtained from a single assumed factor with loadings as given in Table 
2, on the usual assumption that the correlation between any two tests 
equals the sum of the cross-products of the loadings with each fac- 
tor. As can be seen, the estimated communalities and the true com- 
munalities, obtained by squaring the factor loadings, are equal to two 
decimals. In passing, it is of considerable interest to note that the 
communality for the test in which the factor has the largest loading 
is larger than the largest correlation of that test with the other tests, 
whereas the communality for each of the remaining tests is smaller 
than the largest correlation of each of these tests with the other tests. 

The above illustration shows that it is possible, when only one 
factor is present in a correlational matrix and there are no chance 
errors present, to obtain the proper communalities from the correla- 
tions themselves, without prior knowledge of the factor loadings and 
prior to any factoring. A technique for approximating the procedure 
involved in this illustration is one of the basic features of the pre- 
selection factoring procedure when estimating the communality of a 
test when more than one factor is present and when chance errors are 
also present. 


The Determination of Factor Loadings 

When a pure test, i.e., one that has a factor loading of 1.00, is 
present in data where a single factor accounts completely for the in- 
tercorrelations, there is a clear-cut relationship between the factor 
loadings of the factor in the remaining tests and the correlations of 
the pure test with the remaining tests. The factor loadings of the 
remaining tests are equal to the correlation coefficients of this pure 
test with the remaining tests, as seen in Data A in Table 1. Since 
“any factor represents the correlation between a test and one of the 
reference abilities” (1), “reference ability” refers to the ability or 
characteristic measured by a factor whose loadings are equal to the 
correlations of a pure test of that factor with each of the remaining 
impure tests. 

A pure test of a factor being one that has a factor loading of 
1.00, the nearer a factor loading approximates 1.00, the purer is the 
test of the factor in question, and when a factor loading approxi- 


ia 
i 
in 
i 
if 
4 
i 


286 PSYCHOMETRIKA 


mates .00, the test is no measure of the factor whatever. If factor 
loadings are the correlations between a pure test of a factor and im- 
pure tests, then the factor loadings presumably should be capable of 
being determined by an equation similar to that expressing the rela- 
tion between a true criterion and a fallible test, i.e., by an equation 
similar to 


In this equation, when used for factoring, 


Trx — factor loading in test x. 

= correlation coefficient between test x and test y, 
test y being one of the pair of tests having the 
largest correlation coefficient and that one of the 
pair selected to pivot upon. 

Ty —factor loading of test y, estimated by the pre- 
selection procedure. 


The efficacy of this equation can be indicated by using it to ana- 
lyze the correlation matrix in Table 1, after dropping from the corre- 
lation matrix the test a, since it has a factor loading of 1.00 and is 
therefore the pure test of the factor originally assumed. The first 
step in the analysis is to select the largest correlation coefficient in 
the correlational matrix. This is .468, between tests b and c. The 
factor loadings are to be obtained by pivoting on test b or ¢, i.e., upon 
the correlations of one of these tests with the remaining tests. Which 
shall it be? After trial and error attempts had been made with vari- 
ous procedures, it has been found desirable to select that test of the 
pair having the highest correlation which has the largest range of 
correlation values, excluding the largest correlation. In this instance, 
test b has a range of .170, while test c has a range of .109. Test b is 
therefore selected to pivot upon. The communality for test b is found 
to be .72 by means of the technique already described. The square 
root of the communality of .72 for test b gives its factor loading. The 
remaining factor loadings are calculated by means of the equation 
given. Doing so, the loadings are found to be: b, .85; c, .55; d, .35; 
and e, .15. These are the assumed factor loadings. 

By means of the pre-selection factoring procedure, the factor 
loadings have been calculated, involving the determination of only 
one of the four communalities possible and making use of the corre- 
lations of this one test with the remaining tests. This one test is one 
of the two involved in the largest correlation coefficient in the corre- 


lai 
of 
he 
m: 
| I 
Teo = 
Vl to 
w 
th 
W 
by 
ce 
is 
al 
or 
Ww 
Ol 
e 
; d 
t 
v 
i 
5 
i 
4 t 
n 
a 
: e 
; 
I 
Y 


HILDING B. CARLSON 287 


lation matrix and is that one of the two which has the largest range 
of correlation values. The results agree perfectly with those obtained 
by means of the centroid method, when all four proper communalities 
have been inserted into the empty diagonal cells of the correlation 
matrix. Both methods reproduce the assumed factors perfectly. 


Extension of the Pre-Selection Factoring Procedure to Correlation 
Matrices Involving More Than One Factor 

The question may be raised as to whether the pre-selection fac- 
toring procedure can be used profitably to factor a correlation matrix 
involving more than one factor. For demonstrational purposes, it is 
well to use artificial data, since only in such can it be certain that 
the procedure gives results which are unequivocal. Guilford (1) has 
published the results of a centroid analysis involving two factors, and 
Woodrow and Wilson (5) another involving four factors, analyzed 
by both the centroid procedure and a modification of the centroid pro- 
cedure. Both of these will be considered here. 

Guilford, using a centroid analysis with orthogonal rotations, 
is able to reproduce the assumed factor loadings with minor discrep- 
ancies. Since his analysis is given completely in Psychometric Meth- 
ods (1) only the assumed loadings, obtained factors, and residuals 
will be given here in Table 3 in order to compare them with those 
obtained by means of the pre-selection procedure. 


In the pre-selection procedure, the first step is to select the larg- 
est correlation coefficient in the correlation matrix. In Guilford’s 
data, there are two equally large, .74, between tests 3 and 9, and be- 
tween tests 5 and 8. The next step is to select the test of the pair 
which has the largest range of correlation values, excluding from con- 
sideration the correlation first selected from consideration. In this 
instance, test 3 has a range of .72 — .16, or .56; test 9, .64 — .08, or 
56; test 5, .66 — .16, or .50; and test 8, .72 — .08, or .64. Test 8, hav- 
ing the largest range of correlation values among these four tests, is 
therefore selected to pivot upon. Figure 2, involving the correlation 
coefficients for tests 5 and 8, is next drawn, as indicated in the tech- 
nique for determining communality. 

It is at once apparent that the extrapolated lines do not meet at 
a common point on the abscissa. This absence of a common point in- 
dicates at once that more than one factor or chance errors are pres- 
ent. As earlier stated, the assumption is made that one factor is 
mainly involved in the largest correlation, and that the effect of any 
other factor is relatively slight; the largest correlation will therefore 
be treated as if it were due to but one factor. The absence of a com- 
mon point on the abscissa further complicates the problem of deter- 


: i 
ia 
4 
ing 
igh 


288 PSYCHOMETRIKA 


mining the communality for test 8. Numerous schemes have been 
tried, but to date the following procedure seems most successful.* 
Disregard all lines through the pair of correlation coefficients indi- 
cated on the ordinates for test 5 and 8 where the correlations for the 
test to be pivoted upon (test 8) are smaller than those with the other 
test involved (test 5), and where the paired correlations are unlike in 
sign. In this instance, it means disregarding those lines through corre- 
lation coefficients for tests 1, 2, 3, 4, and 9. Determine the distance 
on the abscissa, with the distance between the ordinates as unity, 
between ordinate 8 and the point where each of the remaining 
lines crosses the abscissa. In this instance, the line through test 7 
crosses at point 9, for test 10 at 16.5, and for test 6 at 34. The sum 
of these distances divided by the number of tests involved, plus 1, 
gives a point on the abscissa that can be used to approximate the 
communality. A line is drawn through this point (14.9) on the ab- 
scissa through point .74 on the ordinate for test 5, and extrapolated 
until it crosses the ordinate for test 8. This intersection, .7933, is the 
estimated communality for test 8. Its square root, .8907, is the first 
factor loading for test 8. 

Each correlation for test 8 with the remaining tests is then di- 
vided by the first factor loading for test 8, or more simply, multiplied 
by the inverse of this value. The results are the first factor loadings. 
They are given in Table 3. 

The next step is to subtract the amount of the correlation ac- 
counted for by the first factor from the original correlation matrix. 
This is done in the same manner as in the centroid procedure, except 
that other checks must be substituted to make certain that there are 
no arithmetic errors. This gives the first factor residuals. 

The highest residual correlation coefficient in the first factor resi- 
dual matrix is between tests 3 and 4 and is .6928. Since the range 
of residual coefficients for test 3 with the remaining tests is larger 
than that for test 4, the factor loadings for the second factor are cal- 
culated from the residual correlations of test 3 with the remaining 
tests. The remaining communality for test 3 is found to be .751 by 
means of the procedure for approximating the communality. The fac- 
tor loading for factor II upon test 2? is therefore .867. By multiply- 

* There is one contingency that this method of estimating the communality, 
when the lines through the pairs of correlation coefficients do not meet at a com- 
mon point on the abscissa, does not guard against. When small residual matrices 
are factored, there may be only one or two pairs of correlation coefficients avail- 
able for the determination of the point on the abscissa. In such event. the sum 
of the squares of the factor loadings for those tests whose communalities must 
be estimated may be greater than 1. In some instances, therefore, it is necessary 
to further modify this communality estimating procedure by using some value 


smaller than that indicated by the procedure and which will result in the sum 
of the squares of the factor loadings being less than 1. 


an & &. at ah ah bale 


J 
( 
( 
‘ 


HILDING B. CARLSON 289 


ing the first factor residuals of test 3 by the inverse of .867, the sec- 
ond-factor loadings are obtained. These are given in Table 3. 

A comparison can now be made between the assumed factor load- 
ings, the results obtained by Guilford with the use of the orthodox 
centroid procedure, and those obtained with the use of the pre-selec- 
tion procedure. All pertinent data are given in Table 3, including the 
results of one rotation for each procedure, although the size of the 
rotation in the pre-selection procedure is much smaller than that for 
the centroid method. While both analyses reproduce the assumed fac- 
tor loadings fairly well, it is apparent that the pre-selection proce- 
dure does so with least error. The second factor residuals using the 
centroid procedure vary from .018 to —.026, while those for the pre- 
selection procedure vary from .006 to —.022, each with one exception. 
The one exception—that of .052 for the centroid analysis, and .069 
for the pre-selection procedure, for 7,.—is due not to an error in either 
analysis but to an error in Guilford’s correlation matrix as given in 
Psychometric Methods, page 479, it being recorded as .49, whereas 
it should be .42. 

The pre-selection procedure, with its approximation procedure 
for determining the communality for those tests whose communality 
must be estimated, has reproduced the assumed factor loadings of a 
two-factor problem with greater accuracy than the orthodox proce- 
dure, without the necessity for the reversal of sign in the residual 
table, and in considerably less time. 

Since the Woodrow and Wilson four-factor hypothetical data in- 
volve nothing new, so far as the pre-selection procedure is concerned, 
only the four graphs involved in the determination of communality 
are given, and the results compared with those given by Woodrow and 
Wilson in Table 4. As can be seen in this table, the residuals vary 
from .049 to —.050 with the centroid method, from .169 to —.210 with 
the Woodrow and Wilson procedure, and are all .00 with the pre- 
selection procedure. Of these three analyses, only the pre-selection 
procedure reproduces the assumed factor loadings without error. No 
rotations of axes were needed with the pre-selection procedure or 
with the Woodrow and Wilson procedure, but rotations were neces- 
sary with the orthodox procedure. 


Extension of the Pre-Selection Procedure to Experimental Data 
The pre-selection procedure has been demonstrated to be capable 
of analyzing two correlation matrices of hypothetical factors, as well 
as the single-factor problem with which it began. The question can 
be raised as to whether the method can be employed successfully in 
dealing with experimental data where errors of one kind or another 
are also present. The major stumbling block to such a test of the 


ia { 
4 
¥ 
t 
q 


290 PSYCHOMETRIKA 


method is that with experimental data there is nd absolute criterion 
against which to check the results of the procedure. The best pro- 
cedure, at present, would seem to be to compare the results obtained 
with its use with those obtained from established procedures. Ac- 
cordingly, correlation matrices of a number of published experiment- 
al data which have been analyzed by means of established procedures 
and then rotated to simple orthogonal structure will be re-analyzed 
by the pre-selection procedure, and the results compared. In such 
comparison, it is well to bear in mind that the criterion used is not 
without error, introduced by the factoring technique itself, as has 
been shown by the presence of residuals with the two hypothetical 
problems already considered. 

The results from three sets of experimental data have been se- 
lected for comparison with the results obtained by the pre-selection 
procedure because they are readily available and their analyses re- 
ported completely. The first is Holzinger’s (2) eight physical meas- 
urements data, the second is Guilford’s (1) Army Alpha data, and the 
third is Holzinger’s (2) thirteen psychological variables data. 

Table 5 compares the result obtained by Holzinger, using Thur- 
stone’s centroid method, with rotations, with that obtained by the un- 
rotated pre-selection procedure, for the data involving eight physical 
measurements of individuals. As can be seen, with the unrotated pre- 
selection procedure, the first factor has fewer near zero-order load- 
ings than does the rotated centroid method. This is compensated for 
in the pre-selection procedure by more zero-order loadings in the sec- 
ond factor than in the centroid method with rotation. A slight rota- 
tion of either analysis would result in close agreement between the 
two analyses. 

Guilford’s Army Alpha data involve eight variables and he finds 
three orthogonal factors upon analysis. A comparison of his result 
with that of the pre-selection procedure is given in Table 6. Again 
there are minor discrepancies between the two procedures. However, 
there is no disagreement, from the standpoint of the identification of 
the factors, since the tests in which Guilford’s analysis shows a fac- 
tor to have high (or low) loadings are the same tests that the pre- 
selection procedure finds the factor to have high (or low) loadings. 

Holzinger’s thirteen-variable problem is analyzed by the averoid 
method, so that the result he obtains is not comparable, strictly speak- 
ing, to that which would be obtained with the usual centroid method. 
However, since the only difference between the two methods is in the 
manner of estimating communality, the factor loadings obtained by 
the two procedures are quite similar. Holzinger reports three factors 
when the data are analyzed by the averoid method. His analysis is com- 


: pe 

th 

le: 

of 

se 
le: 

in 

be 
wi 

th 

de 
: to 
re 

le 
ill 

th 
tr 
of 
th 

ta 
co 

: th 
in 
ler 

on 
to 
pr 

qu 

to 
Tl 

Sil 
né 
fr 
ou 
co 
ca 
ou 
co 


HILDING B. CARLSON 291 . 


pared with that of the pre-selection procedure in Table 7. As before, 
there are slight discrepancies, the most outstanding of which are the 
lesser number of zero-order loadings in factor I and larger number 
of zero-order loadings in factors II and III with the unrotated pre- 
selection procedure than with the rotated averoid method. Neverthe- 
less, there is clear agreement between the two analyses as to the tests 
in which each factor has its larger loadings. 


General Considerations 

It has been possible to show that the pre-selection procedure has 
been able to analyze correlational matrices of experimental data as 
well as correlational matrices derived from assumed factors. With 
the assumed data, the procedure has been demonstrated to be able to 
determine factor loadings that agree perfectly with the assumed fac- 
tor loadings in the case of the single-factor and four-factor problems 
considered, and with very minor discrepancies in the two-factor prob- 
lem presented. The precision of this procedure, in so far as these 
illustrative materials are concerned, is greater than that for the or- 
thodox procedures. In the case of the analyses of correlational ma- 
trices of experimental data, it has been possible to show a high degree 
of agreement with the results of orthodox procedures. In doing so, 
the results by the use of the pre-selection procedure have been ob- 
tained without the need for inversion of sign of negative residual 
correlations, with the estimation of only as many communalities as 
there are factors, and at a considerable saving of time. In only one 
instance was it desirable to make a slight rotation with the pre-se- 
lection procedure. 

There are a number of conditions when the use of pre-selection 
procedure might well be questioned. Since the pre-selection procedure 
pivots upon a single test and its correlations with the remaining tests 
to obtain a factor, factor loadings may be influenced by any error 
present to a greater degree than with orthodox procedures. But it is 
questionable whether there is any justification to subjecting any data 
to any factoring procedure if they are known to be highly unreliable. 
The method will certainly be more effective when tests measure a 
single function, or a few functions, than when they are of complex 
nature. But the inclusion of tests of complex nature is not desirable 
from the standpoint of any factoring procedure, as has been pointed 
out repeatedly (4). If correlational matrices involve large negative 
correlations, either negative loadings, or larger negative residuals, 
can be expected. If this is considered undesirable, then the procedure 
ought not be used. On the other hand, the presence of negative 
correlation coefficients is likely to be reflected in negative factor load- 


| 


292 PSYCHOMETRIKA 


ings, or residuals, even with the established procedures. Perhaps the 
most serious consideration involves the estimation of communality; 
since only one communality per factor is estimated and the size of 
the factor loadings for the remaining tests is dependent upon this one 
estimate, errors of some size might be expected. On the basis of all 
the analyses made with the use of this procedure by the author, both 
those included in this report as well as unpublished material, this 
does not seem to be a major consideration. That there will be some 
variation with different estimates of communality there can be no 
doubt, but the variation is not large. There may be some criticism 
that this procedure always results in an orthogonal frame. The de- 
sirability of an orthogonal frame, as contrasted with an oblique one, 
is a moot question. The author’s position is that the orthogonal frame 
is most satisfying since the oblique structure implies something com- 
mon to factors which are oblique. There may be some criticism that 
this procedure does not always result in “simple structure” as de- 
fined (3). The only answer that can be given is that if simple struc- 
ture cannot be obtained after attempting rotations, it cannot be forced. 


Summary 

This paper describes a simple orthogonal multiple factor proce- 
dure involving no inversion of the sign of the negative residuals, the 
estimation of only as many communalities as there are factors, and 
no or only a few minor rotations of the axes to obtain a “meaningful” 
solution. It is called the pre-selection procedure because it involves 
selecting of one of the two tests showing the highest correlation in 
the correlational matrix to be factored and obtaining the first factor 
loadings from the correlations of this one test with the remaining 
tests. After the amount of correlation due to the first factor is sub- 
tracted from the correlational matrix, the second factor is obtained 
from the correlations of one of the two tests showing the highest resi- 
dual correlation with the remaining tests. Succeeding factors are ob- 
tained in the same manner. The validity of the method is shown by 
factoring hypothetical data and also by comparing its results using 
experimental data with those of the centroid method with rotations. 
It is proposed because considerably less time and work are involved 
with its use than with the centroid method with rotations. 


REFERENCES 
1. Guilford, J. P. Psychometric methods. New York: McGraw-Hill, 1986. Pp. 
xvi — 566. 
2. Holzinger, K. J. and Harman, H. H. Factor analysis. University of Chicago 
Press, 1941. Pp. xii — 417. 


| 
4 


HILDING B. CARLSON 293 


8. Thurstone, L. L. The vectors of mind. University of Chicago Press, 1935. 
Pp. xv — 266. 

4. Thurstone, L. L. Current misuse of the factorial methods, Psychometrika, 
1987, 2, 73-76. 

5. Woodrow, H. and Wilson, L. A. A simple procedure for approximate factor 
analysis. Psychometrika, 1986, 1, 245-258. 


TABLE 1 
Data A. Intercorrelations Between Hypothetical Tests 
a b c e 


3 


150.128.083.058 


TABLE 2 
Data A. Assumed Factcer Loadings and Communalities 
Test Loading Communality 
a 1.00 1.0000 
b 85 -7225 
55 
d 385 .1225 
ne 15 0225 


| 

.298 .192 

4 

{ 


294 PSYCHOMETRIKA 


TABLE 3 


Comparison of Assumed Orthogonal Factors, Centroid Analysis With Its 
Rotation, and Pre-Selection Analysis with Its Rotation. 
Guilford’s Data 


Assumed factors  Centroid Analy- Centroid Pre-Selection Pre-Selec- 


sis Rotation Analysis* tion Rota- 
tion 
i I II h2 II he 
1 .70 10 .50 60 42 72 12 69 .50 -70 .10 
2 60 0 .61 78 07 .61 60 .50 .57 54 .62 60 .51 
38 90 .20 .85 16 48 .81 88 .20 .380 87 .84 90 .20 
4 280 .00 .64 58 56 .65 80 .02 .09 80 .65 80 .00 
5 .20 80 .68 71 —45 .70 18 82 .8 10 .70 .81 
6 50 .70 .74 84 —-.16 .74 49 41 46 41 .75 50 .71 
7 80 .64 57 .65 00 81 —10 .66 00 81 
8 10 90 .82 69 —.56 .79 09 .88 .89 00 .79 10 = .88 
9 80 .10 .65 65 50 .67 81 .11 .19 79 .66 80 .10 
10 320 .70 .58 71 .61 28 .73 .74 21 .59 20 71 
Residuals 
Magnitude Centroid Pre-Selection 
050 to .069 1+ 
030 to .049 0 0 
010 to .029 6 
—.010 to .009 25 86 
—.080 to —.011 13 8 


* In all pre-selection analyses all factor loadings and residuals were calculated to four places; 
factor loadings were rounded to two places, and residuals to three, for purposes of publication. As 
is apparent, Guilford’s factor II was obtained as factor I with the pre-selection procedure. The 
pre-selection factors, after rotation, are listed in inverted order, in order to facilitate comparison. 

+ This residual is for tie. Its size is due to an error in Guilford’s correlation matrix, where 
Tx is given as .49, when it should be .4 42, 


TABLE 4 


Comparison of Assumed Orthogonal Factor Loadin 


gs, Centroid Analysis with Orthogonal Rotation, Woodrow 


Selection Analysis. Woodrow and Wilson Data 


and Wilson Analysis, and Unrotated Pre- 


2 
| 
r 


0 T 0 93 
0 0 TLO’— 94 
0 62 93 
06T WwW 600° 080"- 
0 670° 9% OT0 
0 VS 0 680° °% 0S0 
0 0 9 060 
0 v 0 OST 
S[enpissy Jo 
09° 00° OL 00 10-29. 03’ 30- 89 09° 00° 00° 02 
00° 00° 00° 06 LLY vO- OF 00 L8 20- 80 LS Ts 00° 00° 00 O06 6I 
00° 08° 00° 28 LO 00° 6L° 80° 62 00° 08 00° O8 8T 
Tr 00° 00° OF OS or OF 67 vy Ty 00° 00° OF OS LI 
98 00° 09 00° 00 88° v0- 29 vO TO Le COTO" 98 00° 09° 00 00° OT 
go 608° «000° «00° «(OT so 30° so 68h «600° «00° «IT 39° 08° 00° 00° OT &T 
6h 00 00 OO OL SL 10-00 tL 6h 00° 00° 00° OL FT 
19° 09° 00° 0S 00° 96°. cL TO- 00 09° 00° OS 00° &T 
00° OL 00° OS 68° LO tL’ TO- TL’ 00° OL 00° OS ZI 
00° 08 00 OL’ TO" 9L 98 0 00° 8L «(00 00° 08° TT 
$9 OL 00° 00° OF 99" «62 so 69 TO 00° OF 00° 00° OF OT 
03° 00° 00° 08° OF 99° T0° PL’ 28° 038" 00° 00° 08° OF 6 
09 00° «(00 so 89 TL’ 09° OL 00° 00° 8 
Tr os 00° 00 6h OF sy «(OF CTO" tw os OF 00° 00° 
Le 609° «00° «00° «(OT 39 se Le 09° 00° 00° OT 9 
00° OL 00° 02 09° & 98 TO TL 02 00° 00 02 
00° 00° 06 00 LO L0- 78 gL’ 00° 98° TO- Ts" 00° 00° 06 00° 
39 00° 00° 02 08 99° 96° 8T 6L° TL’ 30- 00° 61° 28 00° 00° 02 O8 § 
00° 09 OL 00 90° OS OL 20- 69 OL 00° $3" 00° 09° 00 
sz «00° 00° «(00 66° 00° Le «600° «00° so 60S" 00° 00° 00 T 
SIsA[BUY 


UOSTIM, PUB MOIPOOM ‘SISA[VUW pus UOSTIM 


‘i 
a 
4 
aa 
it 
i 


296 PSYCHOMETRIKA 


TABLE 5 


Comparison of Orthogonally Rotated Centroid Analysis and Unrotated Pre-Se- 
lection Analysis. Holzinger’s Physical Measurements Data 


Orthogonally Unrotated 

Rotated Cen- Pre-Selec- 

troid Analy- tion Analy- 

sis sis 

I I ih 

1 88 .26 .85 88 .14 .79 
2 92 .20 .89 97 .00 .98 
3 89 .88 91 .03 .88 
4 86 .24 .80 86 11 .75 

5 25 .90 .87 39 .92 .99 

6 19 .79 .66 384 .69 .58 
7 14 .55 29 .67 .54 
8 26 .68 .53 43 .50 .44 

Residuals 
Magnitude Centroid* Pre-Selection 

.090 to .109 0 1 
070 to .089° 0 2 
050 to .069 0 0 
030 to .049 4 0 
010 to .029 4 2 
—.010 to .009 9 17 
—.080 to —.011 9 0 
—.050 to —.081 2 3 
—.070 to —.051 0 3 


* Centroid residuals calculated by author. 


1 


Pre-Selection Analysis. Guilford’s Army Alpha Data 


HILDING B. CARLSON 


TABLE 6 
Comparison of Orthogonally Rotated Centroid Analysis and Unrotated 


297 


Orthogonally Rotated Unrotated Pre-Selection 
Centroid Analysis Analysis 
I II IIT h2 I he 
1 .10 46 39 38 .25 48 .29 37 
2 43 —.06 83 87 44 .00 84 20+ 
3 52 .26 31 48 52 .26 .29 42 
4 72 27 10 60 84 .00 00 12 
5 69 06 04 48 68 —.05 05 AT 
6 18 41 48 39 35 38 43 
46 —.04 82 63 67 —.23 907 
8 65 01 06 42 54 —.05 .08 30 
Residuals 
Magnitude Centroid Pre-Selection 
130 to .149 0 1 
110 to .129 0 0 
to .109 1 0 
.070 to .089 2 0 
.050 to .069 2 0 
to .049 10 1 
010 to .029 2 0 
—.010 to .009 6 19 
—.080 to —.011 5 2 
—.050 to —.031 3 3 
—.070 to —.051 3 a 
—.090 to —.071 2 0 
—.110 to —.091 0 0 
—.130 to —.111 0 0 
—.150 to —.131 0 1 


* With the pre-selection procedure, Guilford’s III was obtained as the second factor, and Guil- 
ford’s II as the third factor. In the table, pre-selection factors II and III are inverted so as to 
facilitate comparison with the centroid rotated analysis. 

+ In this small correlation table, there were insufficient data to use the pre-selection method for 
estimating the size of the communality for tests 2 and 7, selected to pivot upon for factors II and 
III. H? was arbitrarily made .90. 


fe 


298 


Comparison of Orthogonally Rotated Averoid Analysis and Unrotated 
Pre-Selection Analysis. Holzinger’s Thirteen Psychological Tests Data 


PONE 


PSYCHOMETRIKA 


TABLE 7 


Orthogonally Rotated 
Averoid Analysis 
I II II 2 
17 .22 64 49 
10 12 41 .20 
14 03 55 32 
.20 14 -50 ol 
32 ol .66 
21 .26 .65 
-76 24 .20 .68 
58 29 35 
aa 19 .20 67 
15 -70 —.13 53 
13 67 11 48 
—.05 .68 16 49 
.09 63 42 58 
Magnitude Averoid 

180 to .149 

110 to .129 0 

090 to .109 3 

070 to .089 5 

050 to .069 7 

030 to .049 9 

010 to .029 17 

—.010 to .009 18 

—.030 to —.011 5 

—.050 to —.031 8 

—.070 to —.051 4 

—.090 to —.071 

—.110 to —.091 0 

—.130 to —.111 0 

—.150 to —.131 0 

—.170 to —.151 0 


Unrotated Pre-Selection 
Analysis 
II III h2 
38 .05 54 
.22 2 19 
21 —14 53 04 
52 Al 
83 18 .00 12 
05 04 
-79 -00 
61 .20 15 44 
.87 .00 -00 -76 
.20 83 .00 12 
51 .26 43 
13 .68 387 61 
82 42 55 .58 
Residuals 
Pre-Selection 

0 

1 

2 

5 

3 

5 

3 

38 

6 

2 

3 

2 

0 

1 


> 
9 
10 
11 
12 
: 13 


HILDING B. CARLSON 299 
100; 
80 


60 


CORRELATION 


20 


FIGURE 1 
Technique for Estimating the Communality of a Test 


CORRELATION 


FIGURE 2 


Modified Technique for Estimating the Communality of a Test When More 
Than One Factor and Chance Errors Are Present. 


. 
i d 35 30 
i e £8 
ac: 
abe 
j 
ad 
bd 
eae: be 
ab 
: i 
BO} 
607: 
2 
| 
| 
3 if! 
VA 
85 5 10 1S 20 es 30 35 : 


300 PSYCHOMETRIKA 


OBIAVAWN— 


CORRELATION 


FIGURE 3 


Graph to Estimate Communality for Test 19, Factor 
son Data. 


o88 


80 


OM AWN— 


60 


CORRELATION 


20 


S888 


Lele) 


i 
9 


4 


FIGURE 4 


Graph to Estimate Communality fcr Test 4, Factor II. Woodrow and Wil 
son Data. 


BB 


I. Woodrow and Wil- 


19 | 
10 
" 
(2 
13 
: 14 
SZ ‘6 
18 
| j {9 
20 
9 
4 00 
56 
100, 12 
00 
10 24 
00 
12 40 
13 00 
14 00 
: 15 00 
16 32 
17 00 
18 00 
| i 19 00 
6 


HILDING B, CARLSON 301 


1 00 00 
2 48 48 
3 00 00 
4 00 00 
6 00 00 
% 32 22 
8 S6 56 
10 00 00 
64 
12 56 56 
< 13 00 00 
14 00 00 
00 00 
17 00 00 
: 18 64 
ee 19 00 00 
i 20 00 00 
oobe 


(8 


FIGURE 5 
Graph to Estimate Communality for Test 11, Factor III. Woodrow and Wil- 
son Data. 


10 
40 35 
2 00 00 
: 
i 3 00 00 
i 4 00 00 
i 5S 00 00 
80 ; 6 48 42 
7 40 35 
i 8 48 42 
9 00 00 
Z60 56 
Q 00 00 
00 
2 48 42 
c 
ra) S6 
00 00 
00 00 
20 00 00 
00 
48 42 
00 


Graph to Estimate Communality for Test 15, Factor IV. Woodrow and Wil- 
son Data. 


> 
| 
10 
i 


x 


INDEX FOR VOLUME 10 


AUTHORS 
Ansbacher, H. L. (with K. Mather) ,“Group Differences in Size Esti- 
mation.” 37-56. 


Beall, Geoffrey, “Approximate Methods in Calculating Discriminant 
Functions.” 205-217. 


Benjamin, Kurt, “An I.B.M. Technique for the Computation of >X? 
and SXY.” 61-67. 

Carlson, Hilding B., “A Simple Orthogonal Multiple Factor Approxi- 
mation Procedure.” 283-301. 

Carroll, John B., “The Effect of Difficulty and Chance Success on Cor- 
relations Between Items or Between Tests.” 1-19. 

Conrad, H. S., “A Review of ‘A Factorial Study of Perception’ by L. 
L. Thurstone.” 69-71. 

Crawford, Isabelle, (with D. M. Hall, and E. L. Welker), “Factor 
Analysis Calculations by Tabulating Machines.” 93-125. 

Davis, Frederick B., “The Reliability of Component Scores.” 57-60. 


Dunlap, Jack W., (with Donald W. Fiske), “A Graphical Test for the 
Significance of Differences between Frequencies from Different 
Samples.” 225-229. 

Dyer, Henry S., “The Usability of the Concept of ‘Prejudice.’” 219- 
224. 

Fiske, Donald W., (with Jack W. Dunlap), “A Graphical Test for the 
Significance of Differences between Frequencies from Different 
Samples.” 225-229. 

Gulliksen, Harold,“ The Relation of Item Difficulty and Inter-item 
Correlation to Test Variance and Reliability.” 79-91. 

Guttman, Louis, “A Basis for Analyzing Test-Retest Reliability.” 255- 
282. 

Hall, D. M., (with E. L. Welker, and Isabelle Crawford), “Factor 
Analysis Calculations by Tabulating Machines.” 93-125. 

Holzinger, Karl J., “Interpretation of Second-order Factors.” 21-25. 

Holzinger, Kar] J., “Spearman as I Knew Him.” 231-235. 


Hoyt, Cyril J., “Testing Linear Hypotheses Illustrated by a Simple 
Example in Correlation.” 199-204. 


303 


| 
\ 


304 INDEX 


Johnson, Palmer O., (with Fei Tsao), “Factorial Design and Covari- 
ance in the Study of Individual Educational Development.” 133- 


162. 
Kaitz, Hyman B., “A Note on Reliability.” 127-131. 


Mather, K., (with H. L. Ansbacher), “Group Differences in Size Esti- 
mation.” 37-56. 


Psychometric Corporation, Report of the Treasurer. Appendix B, ii. 


Psychometric Society, Report of Committee on Reorganization and 
Divisional Status. Appendix C, iii. 


Psychometric Society, Report of the Treasurer. Appendix A, i. 


Reiner, J. M., (with S. Spiegelman), “A Note on Steady States and 
the Weber-Fechner Law.” 27-35. 


Spiegelman, S., (with J. M. Reiner), “A Note on Steady States and 
the Weber-Fechner Law.” 27-35. 


Thurstone, L. L., “A Multiple Group Method of Factoring the Corre- 
lation Matrix.” 73-78. 

Thurstone, L. L., “The Effects of Selection in Factor Analysis.” 165- 
198. 

Thurstone, L. L., “The Prediction of Choice.” 237-253. 

Tsao, Fei, (with Palmer O. Johnson), “Factorial Design and Covari- 
ance in the Study of Individual Educational Development.” 133- 
162. 


Welker, E. L., (with D. M. Hall, and Isabelle Crawford), “Factor 
Analysis Calculations by Tabulating Machines.” 93-125. 


PSYCHOMETRIC SOCIETY 
REPORTS 


APPENDIX A 
REPORT OF THE TREASURER OF THE 
PSYCHOMETRIC SOCIETY 
JULY 1, 1942, TO JUNE 30, 1945 
Balance on hand, July 1,1942 - - - - - 
Receipts, July 1, 1942 to June 380, 1945 
As of year 1941 


2regular members - - - - - - $ 10.00 

As of year 1942 
27 regular members - - - - - - 135.00 
7 student members - - - - - = 21.00 

As of year 1943 
202 regular members - - - - - - 1010.00 
8 student members - - - - - - 24.00 
lirregular member - - - - - - 2.50 

As of year 1944 
197 regular members - - - - - - 985.00 
li student members - - - - - - 33.00 

As of year 1945 
178 regular members - - - - - - 890.00 
9 student members - - - - - - 27.00 

As of year 1946 

l regular member - - - - - 5.00 3142.50 


Expenditures, July 1, 1942 to June 30, 1945 


Postage - - - - = = = = = = 39.18 
Printing - - - - = - = = = = 9.50 
Psychometric Corporation - - - - - - - 2822.68 
Miscellaneous Expenses - - - - - - 5.95 2877.26 


Balance on hand, June 30,1945 - - - - - 


$ 233.84 


$3376.34 


$499.08 


APPENDIX B 
REPORT OF THE TREASURER OF THE 
PSYCHOMETRIC CORPORATION 
JULY 1, 1942, TO JUNE 30, 1945 


Balance on hand, July 1,1942 - - - - - 
Receipts, July 1, 1942 to June 30, 1945 


Charges to authors - - - - - - - $ 41.44 
Psychometric Society - - - - - - 2822.68 
Royalties: Psychometric Monographs -_ - 17.04 
Subscriptions and Sales 

Individual - - - - - - 881.65 

Library - - - - - -  - 3981.50 4263.15 


$2755.03 


7144.26 


Expenditures, July 1, 1942 to June 30, 1945 


American Bonding Company - - - - 22.50 
Editorial Costs - - - - - - = = 404.24 
Office Supplies - - - - - - = = 14.46 
Postage - - - - - = = = = = 90.60 
Printing Costs - - - - - - = = 8589.08 
Refunds due to overpayments and 

eaneellations - - - - - = - = 22.50 
Security Storage Company - - - - - 22.20 
Miscellaneous Expenses - - - = - - 33.52 


$9899.29 


4199.10 


ii 


$5700.19 


| 
| 
| 

i 


APPENDIX C 


REPORT TO THE PSYCHOMETRIC SOCIETY BY ITS COMMITTEE 
ON REORGANIZATION AND DIVISIONAL STATUS 


Historical 


The Psychometric Society was founded in 1985. Psychometrika, the official 
journal of the Society, began publication in March, 1986. This Journal is not 
owned by the Society, but by the Psychometric Corporation—a nonprofit organiza- 
tion that provided the original capital for the establishment of the Journal. There 
is a contract which provides that the Psychometric Society pay 90 per cent of its 
membership dues to the Psychometric Corporation, in return for which the Cor- 
poration will provide each member of the Psychometric Society with a subscrip- 
tion to Psychometrika. The Psychometric Society has a membership of around 
200 (a few of whom are students and pay $3 instead of $5 dues) and thus has 
had only about $100 a year to finance all of its activities other than Psychometrika. 
It is to the credit of its officers that, for nearly a decade, it has been able to oper- 
ate within such a small budget and even to accumulate a small surplus. Partially 
because of additional income from its library subscriptions, the Psychometric Cor- 
poration has also been able to operate well within its income and to accumulate 
a substantial reserve. 


The Psychometric Society became affiliated with the American Psychological 
Association on September 5, 1935. After that time, the A.P.A. amended its By- 
Laws to read “... the principal officers and the governing board of each affiliated 
organization must be Members or Associates of the American Psychological Asso- 
ciation.” This restriction may not apply to the Psychometric Society, since it 
was put into effect without the advice or consent of the Psychometric Society, and 
the Psychometric Scciety, without the advice or consent of the A.P.A., has elected 
at least one principal officer who was not a Member or Associate of the A.P.A. 


The Psychometric Society became affiliated with the American Association 
for the Advancement of Science on April 17, 1937. As a result of this affiliation, 
the Psychometric Society is entitled to one representative on the Council of that 
Association. When the Psychometric Society chooses to meet jointly with the 
American Associaticn for the Advancement of Science, as it did in 1944, programs 
are printed and meeting rooms and necessary equipment are provided by the 
American Association for the Advancement of Science at no cost, direct or in- 
direct, to the Psychometric Society. 


Invitation to Become a Division of the A.P.A. 


The Psychometric Society, along with other societies, was invited to (and 
did) send representatives to the Intersociety Constitutional Convention of Psy- 
chologists which met in New York City on May 29-31, 1943. Later a Joint Con- 
stitutional Committee (composed solely of representatives of the American Psy- 
chological Association and of the American Association for Applied Psychology) 
continued the work begun by the Intersociety Convention and in June, 1944, pre- 
pared a set of “Proposed By-Laws for the American Psychological Association.” 
These By-Laws provide that one of the charter Divisions of the Association shall 
be “The Psychometric Society—A Division of the American Psychological Asso- 
ciation.” These By-Laws were approved by both the A.P.A. and the A.A.A.P. on 
September 12, 1944. On November 14, 1944, the A.P.A. extended to the Psycho- 
metric Society a formal invitation to become a Charter Division of the reorganized 
American Psychological Association. 


Activities of This Committee 
On October 5, 1944, President Harold Gulliksen appointed the following Com- 
mittee on Reorganization and Divisional Status: Col. John C. Flanagan, Dr. Irv- 
ing Lorge, Lt. Col. M. W. Richardson, Dean P. J. Rulon, Dr. L. L. Thurstone, and 
Dr. Albert K. Kurtz, Chairman. 


iii 


Both the President of the Society and the Chairman of your Committee dis- 
cussed the proposed affiliation with a few members of this Committee and with 
other members of the Society. After some correspondence, a meeting of your 
Committee was held in New York City on November 21, 1944. This meeting was 
attended by Gulliksen, Lorge, Richardson, and Kurtz, the other members being 
unable to attend. The conclusions reached at that meeting were written up and 
submitted to all Committee members. Revisions were made and controversial 
topics were eliminated, so that the present report is agreed to in principle, and 
perhaps even in toto, by each member of your Committee. 


-  Conelusions 

Your Committee carefully considered the past and present financial status 
of both the Society and the Psychometric Corporation. (See Reports of the Treas- 
urer of the Psychometric Society and of the Treasurer of the Psychometric Cor- 
poration.) It was felt that the finances had been handled unusually well. It was 
doubted that the affairs of these two organizations would have been run or could 
be run so economically as a part of the A.P.A., especially if Psychometrika had 
been edited, printed, and published in the same manner in which A.P.A. journals 
are now handled. (See Supplement to Appendix C.) 


Your Committee considered the “Proposed By-Laws for the American Psy- 
chological Association” dated June, 1944. It was soon decided that it would not 
be desirable for the Society to become a Division under the precise terms 
— in those By-Laws. However, your Committee was given to understand 
that these By-Laws are still subject to change. The following conclusions outline 
the conditions under which your Committee believes it might be desirable for the 
Society to become a Division. If these conditions are agreed to by the member- 
ship of the Society and are acceptable to the A.P.A., arrangements can then be 
made for changing the status of the Society. 


The following are the conclusions of your Committee: 

1. There is no financial gain to the Society through its becoming a Division 
of the A.P.A. 

2. There may be some professional advantage to the members of the Society. 
and to psychologists in general, provided that the Psychometric Society 
does not sacrifice its autonomy in becoming a Division of the A.P.A. 

8. If the Society becomes a Division of the A.P.A., the present contract be- 
tween the Society and the Psychometric Corporation must remain in ef- 
fect (or an essentially similar contract, acceptable to both parties, must 
be substituted.) 

4. As in the past, every member of the Psychometric Society must be re- 
quired to subscribe to Psychometrika, through an allotment of dues. This 
would also apply to all new members of the Psychometric Division. 

5. The dues schedule described in Article XIX of the Proposed By-Laws for 
the A.P.A. will have to be altered so as to provide for “non-affiliates” of 
the A.P.A. The various classes of members will then pay dues accord- 
ing to the dues schedule shown in the table, Proposed Dues Schedule. 

6. If the Society becomes an A.P.A. Division, it shall have the right to re- 
tain, as nonreverting funds, the present assets of the Psychometric So- 
ciety, and to accumulate additional nonreverting funds. 

7. If the Society becomes an A.P.A. Division, it shall have the right to bu 
or accept as a gift from the Psychometric Corporation either the jou 
Psychometrika, or any other assets of the Psychometric Corporation and 
to retain and control such journal or other assets as nonreverting funds, 
and to accumulate additional nonreverting funds. Under such circum- 
stances, the Society rather than the entire A.P.A., or the A.P.A. Board 
of Editors, would also retain complete editorial and other control over 
Psychometrika. 

8. If the Society becomes an A.P.A. Division, it shall have the right to affili- 
ate with any other organizations and to maintain existing affiliations (ex- 
cept that with the A.P.A., which would presumably be canceled by be- 
coming a Division.) 


iv 


— 
| 
Fel 
mo: 
i affi 
As: 
oth 
Ass 
| oth 
Stu 
aff 
Stu 
i aff 
Div 
aff 
Div 
aff 
No 
aff 
No 
1 aff 
{ 


PROPOSED DUES SCHEDULE 


Fellow (with 1 or 
more other Member $15.00 $5.00 $20.00 $15.00 $1.00 $4.00 $0.50 $4.50 
affiliations) or more or more or more 
Asso. (with Member 10.00 5.00 15.00 10.00 100 4.00 0.50 4.50 
other affil.) or more or more or more 
Asso. (with Student 
other affil.) Member 10.00 3.00 18.00 10.00 1.00 2.00 030 2.70 
or more or more or more 
Student 
affiliate Member 5.00 5.00 10.00 5.00 0.00 5.00 0.50 4.50 
Student Student 
affiliate Member 5.00 3.00 8.00 5.00 0.00 3.00 0.30 270 
Division 
affiliate Member 2.00 5.00 7.00 2.00 0.00 5.00 0.50 4650 
Division Student 
affiliate Member 2.00 3.00 5.00 2.00 0.00 3.00 0.30 2.70 
Non- 
affiliate Member 0.00 5.00 5.00 0.00 0.00 5.00 0.50 450 
Non- Student 
affiliate Member 0.00 3.00 3.00 0.00 0.00 3.00 0.30 2.70 


The question of participation in the proposed Central Organization for the 
Statistical Societies was discussed briefly but not settled. It is recommended that 
your Committee be continued in order to take up this matter as well as to con- 
sider further the possibility and the desirability of the Psychometric Society’s 


becoming a Division of the American Psychological Association. 


Respectfully submitted, 


JOHN C. FLANAGAN 
HAROLD GULLIKSEN, ex officio 
IRVING LORGE.. 

M. W. RICHARDSON 
P. J. RULON 
L. L. THURSTONE : 
ALBERT K. Kurtz, Chairman 


Supplement to Appendix C 


The data in the following table are taken from pages 781-798 of the Decem- 
ber, 1944, Psychological Bulletin and from records of the Psychometric Corpora- 
tion. It is realized that many factors arise to complicate the comparisons. E.g., 
the Psychological Abstracts and the Psychological Monographs have a large page 
size, while Psychometrika contains a great many pages of formulas and tables. 
Also, all expenses of the Psychometric Corporation (whether related to Psycho- 
metrika or not) are here charged to Psychometrika, 


There is a difference in the computation of circulation figures. Reprints of 
articles published in A.P.A. journals are ordered and paid for individually and 
are presumably excluded from circulatien figures. Every Psychometrika author 
is given 200 reprints free. These are included in the Psychometrika circulation 
figures because they are paid for by that Journal. (If the reader disagrees with 
this reasoning, subtract 200.) 


Finally, printing costs on a per page basis are not comparable when there 
are large differences in circulation. Since Psychometrika has a small circulation, 
the last row is inserted for comparative purpeses only. It was obtained by aver- 
aging the 1943 and 1944 figures, determining additional printing costs from our 
present printing contract, and assuming that editorial and all other expenses 
would double if the circulation were doubled. 


This last row shows (a) that Psychometrika’s estimated cost of $6.31 per 
page would still be well under the $7.25 to $8.80 costs of the most nearly com- 
parable A.P.A. journals and (b) that its cost of $0.89 per page per 100 circula- 
‘tion would also be lower than that of any comparable A.P.A. journal. 


| 
E § 4 
Psychol. Bull. 1943 $4,878 $2,659 $7,587 800 $ 9.42 3,474 $0.27 
Psychol. Abs. 1943 6,178 17,780 18,958 521 26.79 3,649 0.73 
J. Abn. & Soc. Psychol. 1943 4,099 1,475 5,574 769 7.25 1,569 0.46 
Psychol. Rev. 1943 3,477* 1,218 4,695 6384 7.41 1,543 0.48 
J. Exp.Psychol. 1943 6,547 2,097 8,644 1,089 8.32 1,298 0.64 
J. App. Psychol. 1943 8,281 1,557 4,888 550 8.80 1,700 (7) 0.52 (?) 
Be Psychol. Monog. 1943 2,744 752 3,496 325 10.76 1,400 0.77 
: Psychometrika, 1943 1,090 199 1,289 272 474 800 0.59 
Psychometrika, 1944 1,247 10 1,416 286 4.95 800 0.62 
estimate 
or larger circulation 1,392 868 1,760 279 6.31 1,600 0.39 


* Printed as $3746.82 on page 785, but as $3476.82 on page 792. 


\ 
7 
4 
} 
| | 
= 
| 
7 
| 
a 
: 
vi 
5 
| 
5 


emphasis is to be placed on articles of type (1), in vo far a6 articles of this 


type are available. 


In the selection of the articles to be printed in Psychometrika, an effort is made 
to obtain objectivity of choice. All manuscripts are received by one person who 
first removes from each article the name of contributor and institution. The 
article is then sent to three or more persons who make independent judgments 
upon the suitability of the article submitted. This procedure seems to offer a 
possible mechanism for making judicious and fair selections. 

Prospective authors are referred to the “Rules for Preparation of Manuscripts 
for Psychometrika,” contained in the June, 1944 issue. Reprints of these 
“Rules” are available upon request. A manuscript which fails to comply with 
these requirements will be returned to the author for revision. 

Authors will be charged for cuts. They will receive 200 reprints without covers, 
free of charge. 


ga for publication in Psychometrika should be sent to 
HAROLD GULLIKSEN 
Managing Editor Psychometrika 
College Entrance Examination Board, Box 592 
Princeton, New Jersey. 


The officers of the Psychometric Society for the year 1946 are as follows: 
President: Edward E. Cureton, Chief, Civilian Personnel Research Subsection, 
AGO, War Dept., 270 Madison Ave., New York 16, N. Y.; Secretary: Harold 
Edgerton, Associate Professor, Department of Psychology, Ohio State University, 
Columbus, Ohio; Treasurer: Irving D. Lorge, Associate Professor of Education, 
Institute of Educational Research, Columbia University, New York, New York. 


The council members, together with date at which term expires, are as follows: 
J. W. Dunlap, 1948; Harold Gulliksen, 1948; Paul Horst, 1947; L. L. Thurstone, 
1947; A. K. Kurtz, 1946; J. P. Guilford, 1946. 


Editorial Council: 
Chairman:—L. L. Thurstone 
Editors:—A. K. Kurtz, M. W. Richardson 
Managing Editor:—Harold Gulliksen 
Assistant Managing Editor:—Dorothy C. Adkins 


Editorial Board:— 

S. Conran Pau. Horst P. J. RULON 
ELMER A. CULLER ALSTON S. HOUSEHOLDER CHARLES SPEARMAN —~ 
E. E, CURETON CuarRK HULL Wo. STEPHENSON 
JACK W. DUNLAP TRUMAN L, KELLEY S. A. STOUFFER 

‘Max D. ENcetHart’ A. K. Kurtz GoprREY THOMSON 

HENRY C. GARRETT IRVING LORGE L. L. THURSTONE 
J. P. GUILFORD QUINN MCNEMAR LEDYARD TUCKER 
HAROLD GULLIKSEN CHARLES I. MOSIER S. S. WILKs 
CHARLES M.HaRSH NICOLAS RASHEVSKY HERBERT WooDROW 
K. J. HOLZINGER M. W. RICHARDSON : 


PRINTED BY THE DENTAN PRINTING CO., COLORADO SPRINGS, COLORADO 


2 


~ 
: 
4 
i 
j 
4 
4 
a 
i 


